0% found this document useful (0 votes)
196 views12 pages

LLVM

LLVM is a compiler framework that aims to enable lifelong program analysis and transformation in a transparent way for programmers. It consists of (1) a low-level virtual machine code representation that includes type information and control flow graphs to expose information for analysis and optimization, and (2) a compiler design that leverages this representation to allow analysis, transformation, and optimization at compile-time, link-time, load-time, runtime, and between runs. The goal is to maximize program efficiency through analysis and optimization throughout a program's lifetime in a way that existing compilers and virtual machines do not provide together.

Uploaded by

virtuabhi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
196 views12 pages

LLVM

LLVM is a compiler framework that aims to enable lifelong program analysis and transformation in a transparent way for programmers. It consists of (1) a low-level virtual machine code representation that includes type information and control flow graphs to expose information for analysis and optimization, and (2) a compiler design that leverages this representation to allow analysis, transformation, and optimization at compile-time, link-time, load-time, runtime, and between runs. The goal is to maximize program efficiency through analysis and optimization throughout a program's lifetime in a way that existing compilers and virtual machines do not provide together.

Uploaded by

virtuabhi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

LLVM: A Compilation Framework for

Lifelong Program Analysis & Transformation

Chris Lattner Vikram Adve


University of Illinois at Urbana-Champaign
{lattner,vadve}@cs.uiuc.edu
http://llvm.cs.uiuc.edu/

ABSTRACT mizations performed at link-time (to preserve the benefits of


This paper describes LLVM (Low Level Virtual Machine), separate compilation), machine-dependent optimizations at
a compiler framework designed to support transparent, life- install time on each system, dynamic optimization at run-
long program analysis and transformation for arbitrary pro- time, and profile-guided optimization between runs (“idle
grams, by providing high-level information to compiler time”) using profile information collected from the end-user.
transformations at compile-time, link-time, run-time, and in Program optimization is not the only use for lifelong anal-
idle time between runs. LLVM defines a common, low-level ysis and transformation. Other applications of static anal-
code representation in Static Single Assignment (SSA) form, ysis are fundamentally interprocedural, and are therefore
with several novel features: a simple, language-independent most convenient to perform at link-time (examples include
type-system that exposes the primitives commonly used to static debugging, static leak detection [24], and memory
implement high-level language features; an instruction for management transformations [30]). Sophisticated analyses
typed address arithmetic; and a simple mechanism that can and transformations are being developed to enforce program
be used to implement the exception handling features of safety, but must be done at software installation time or
high-level languages (and setjmp/longjmp in C) uniformly load-time [19]. Allowing lifelong reoptimization of the pro-
and efficiently. The LLVM compiler framework and code gram gives architects the power to evolve processors and
representation together provide a combination of key capa- exposed interfaces in more flexible ways [11, 20], while al-
bilities that are important for practical, lifelong analysis and lowing legacy applications to run well on new systems.
transformation of programs. To our knowledge, no existing This paper presents LLVM — Low-Level Virtual Ma-
compilation approach provides all these capabilities. We de- chine — a compiler framework that aims to make lifelong
scribe the design of the LLVM representation and compiler program analysis and transformation available for arbitrary
framework, and evaluate the design in three ways: (a) the software, and in a manner that is transparent to program-
size and effectiveness of the representation, including the mers. LLVM achieves this through two parts: (a) a code rep-
type information it provides; (b) compiler performance for resentation with several novel features that serves as a com-
several interprocedural problems; and (c) illustrative exam- mon representation for analysis, transformation, and code
ples of the benefits LLVM provides for several challenging distribution; and (b) a compiler design that exploits this
compiler problems. representation to provide a combination of capabilities that
is not available in any previous compilation approach we
know of.
1. INTRODUCTION The LLVM code representation describes a program using
Modern applications are increasing in size, change their an abstract RISC-like instruction set but with key higher-
behavior significantly during execution, support dynamic level information for effective analysis. This includes type
extensions and upgrades, and often have components writ- information, explicit control flow graphs, and an explicit
ten in multiple different languages. While some applications dataflow representation (using an infinite, typed register set
have small hot spots, others spread their execution time in Static Single Assignment form [15]). There are several
evenly throughout the application [14]. In order to maxi- novel features in the LLVM code representation: (a) A low-
mize the efficiency of all of these programs, we believe that level, language-independent type system that can be used to
program analysis and transformation must be performed implement data types and operations from high-level lan-
throughout the lifetime of a program. Such “lifelong code guages, exposing their implementation behavior to all stages
optimization” techniques encompass interprocedural opti- of optimization. This type system includes the type infor-
mation used by sophisticated (but language-independent)
techniques, such as algorithms for pointer analysis, depen-
dence analysis, and data transformations. (b) Instructions
for performing type conversions and low-level address arith-
metic while preserving type information. (c) Two low-level
exception-handling instructions for implementing language-
specific exception semantics, while explicitly exposing ex-
ceptional control flow to the compiler.
The LLVM representation is source-language-independent,
for two reasons. First, it uses a low-level instruction set and We believe that no previous system provides all five of
memory model that are only slightly richer than standard these properties. Source-level compilers provide #2 and #4,
assembly languages, and the type system does not prevent but do not attempt to provide #1, #3 or #5. Link-time
representing code with little type information. Second, it interprocedural optimizers [21, 5, 26], common in commer-
does not impose any particular runtime requirements or se- cial compilers, provide the additional capability of #1 and
mantics on programs. Nevertheless, it’s important to note #5 but only up to link-time. Profile-guided optimizers for
that LLVM is not intended to be a universal compiler IR. static languages provide benefit #2 at the cost of trans-
In particular, LLVM does not represent high-level language parency, and most crucially do not provide #3. High-level
features directly (so it cannot be used for some language- virtual machines such as JVM or CLI provide #3 and par-
dependent transformations), nor does it capture machine- tially provide #1 and #5, but do not aim to provide #4,
dependent features or code sequences used by back-end code and either do not provide #2 at all or without #1 or #3.
generators (it must be lowered to do so). Binary runtime optimization systems provide #2, #4 and
Because of the differing goals and representations, LLVM #5, but provide #3 only at runtime and to a limited extent,
is complementary to high-level virtual machines (e.g., Small- and most importantly do not provide #1. We explain these
Talk [18], Self [43], JVM [32], Microsoft’s CLI [33], and oth- in more detail in Section 3.
ers), and not an alternative to these systems. It differs from We evaluate the effectiveness of the LLVM system with re-
these in three key ways. First, LLVM has no notion of high- spect to three issues: (a) the size and effectiveness of the rep-
level constructs such as classes, inheritance, or exception- resentation, including the ability to extract useful type infor-
handling semantics, even when compiling source languages mation for C programs; (b) the compiler performance (not
with these features. Second, LLVM does not specify a the performance of generated code which depends on the
runtime system or particular object model: it is low-level particular code generator or optimization sequences used);
enough that the runtime system for a particular language and (c) examples illustrating the key capabilities LLVM pro-
can be implemented in LLVM itself. Indeed, LLVM can vides for several challenging compiler problems.
be used to implement high-level virtual machines. Third, Our experimental results show that the LLVM compiler
LLVM does not guarantee type safety, memory safety, or can extract reliable type information for an average of 68%
language interoperability any more than the assembly lan- of the static memory access instructions across a range of
guage for a physical processor does. SPECINT 2000 C benchmarks, and for virtually all the ac-
The LLVM compiler framework exploits the code repre- cesses in more disciplined programs. We also discuss based
sentation to provide a combination of five capabilities that on our experience how the type information captured by
we believe are important in order to support lifelong anal- LLVM is enough to safely perform a number of aggressive
ysis and transformation for arbitrary programs. In general, transformations that would traditionally be attempted only
these capabilities are quite difficult to obtain simultaneously, on type-safe languages in source-level compilers. Code size
but the LLVM design does so inherently: measurements show that the LLVM representation is com-
parable in size to X86 machine code (a CISC architecture)
(1) Persistent program information: The compilation model and roughly 25% smaller than RISC code on average, de-
preserves the LLVM representation throughout an ap- spite capturing much richer type information as well as an
plication’s lifetime, allowing sophisticated optimiza- infinite register set in SSA form. Finally, we present exam-
tions to be performed at all stages, including runtime ple timings showing that the LLVM representation supports
and idle time between runs. extremely fast interprocedural optimizations.
(2) Offline code generation: Despite the last point, it is Our implementation of LLVM to date supports C and
possible to compile programs into efficient native ma- C++, which are traditionally compiled entirely statically.
chine code offline, using expensive code generation We are currently exploring whether LLVM can be beneficial
techniques not suitable for runtime code generation. for implementing dynamic runtimes such as JVM and CLI.
This is crucial for performance-critical programs. LLVM is freely available under a non-restrictive license2 .
The rest of this paper is organized as follows. Section 2
(3) User-based profiling and optimization: The LLVM describes the LLVM code representation. Section 3 then
framework gathers profiling information at run-time in describes the design of the LLVM compiler framework. Sec-
the field so that it is representative of actual users, and tion 4 discusses our evaluation of the LLVM system as de-
can apply it for profile-guided transformations both at scribed above. Section 5 compares LLVM with related pre-
run-time and in idle time1 . vious systems. Section 6 concludes with a summary of the
(4) Transparent runtime model: The system does not paper.
specify any particular object model, exception seman-
tics, or runtime environment, thus allowing any lan- 2. PROGRAM REPRESENTATION
guage (or combination of languages) to be compiled
The code representation is one of the key factors that dif-
using it.
ferentiates LLVM from other systems. The representation is
(5) Uniform, whole-program compilation: Language-indep- designed to provide high-level information about programs
endence makes it possible to optimize and compile all that is needed to support sophisticated analyses and trans-
code comprising an application in a uniform manner formations, while being low-level enough to represent ar-
(after linking), including language-specific runtime li- bitrary programs and to permit extensive optimization in
braries and system libraries. static compilers. This section gives an overview of the LLVM
instruction set and describes the language-independent type
1
An idle-time optimizer has not yet been implemented in
2
LLVM. See the LLVM home-page: http://llvm.cs.uiuc.edu/.
system, the memory model, exception handling mechanisms, (for example, see Section 4.1.1). In addition, type mis-
and the offline and in-memory representations. The detailed matches are useful for detecting optimizer bugs.
syntax and semantics of the representation are defined in the The LLVM type system includes source-language-indep-
LLVM reference manual [29]. endent primitive types with predefined sizes (void, bool,
signed/unsigned integers from 8 to 64 bits, and single- and
2.1 Overview of the LLVM Instruction Set double-precision floating-point types). This makes it possi-
The LLVM instruction set captures the key operations of ble to write portable code using these types, though non-
ordinary processors but avoids machine-specific constraints portable code can be expressed directly as well. LLVM also
such as physical registers, pipelines, and low-level calling includes (only) four derived types: pointers, arrays, struc-
conventions. LLVM provides an infinite set of typed virtual tures, and functions. We believe that most high-level lan-
registers which can hold values of primitive types (Boolean, guage data types are eventually represented using some com-
integer, floating point, and pointer). The virtual registers bination of these four types in terms of their operational
are in Static Single Assignment (SSA) form [15]. LLVM behavior. For example, C++ classes with inheritance are
is a load/store architecture: programs transfer values be- implemented using structures, functions, and arrays of func-
tween registers and memory solely via load and store op- tion pointers, as described in Section 4.1.2.
erations using typed pointers. The LLVM memory model is Equally important, the four derived types above capture
described in Section 2.3. the type information used even by sophisticated language-
The entire LLVM instruction set consists of only 31 op- independent analyses and optimizations. For example, field-
codes. This is possible because, first, we avoid multiple op- sensitive points-to analyses [25, 31], call graph construc-
codes for the same operations3 . Second, most opcodes in tion (including for object-oriented languages like C++),
LLVM are overloaded (for example, the add instruction can scalar promotion of aggregates, and structure field reorder-
operate on operands of any integer or floating point operand ing transformations [12], only use pointers, structures, func-
type). Most instructions, including all arithmetic and logi- tions, and primitive data types, while array dependence
cal operations, are in three-address form: they take one or analysis and loop transformations use all those plus array
two operands and produce a single result. types.
LLVM uses SSA form as its primary code representation, Because LLVM is language independent and must support
i.e., each virtual register is written in exactly one instruc- weakly-typed languages, declared type information in a legal
tion, and each use of a register is dominated by its definition. LLVM program may not be reliable. Instead, some pointer
Memory locations in LLVM are not in SSA form because analysis algorithm must be used to distinguish memory ac-
many possible locations may be modified at a single store cesses for which the type of the pointer target is reliably
through a pointer, making it difficult to construct a rea- known from those for which it is not. LLVM includes such
sonably compact, explicit SSA code representation for such an analysis described in Section 4.1.1. Our results show that
locations. The LLVM instruction set includes an explicit despite allowing values to be arbitrarily cast to other types,
phi instruction, which corresponds directly to the standard reliable type information is available for a large fraction of
(non-gated) φ function of SSA form. SSA form provides a memory accesses in C programs compiled to LLVM.
compact def-use graph that simplifies many dataflow opti- The LLVM ‘cast’ instruction is used to convert a value of
mizations and enables fast, flow-insensitive algorithms to one type to another arbitrary type, and is the only way to
achieve many of the benefits of flow-sensitive algorithms perform such conversions. Casts thus make all type conver-
without expensive dataflow analysis. Non-loop transforma- sions explicit, including type coercion (there are no mixed-
tions in SSA form are further simplified because they do type operations in LLVM), explicit casts for physical sub-
not encounter anti- or output dependences on SSA registers. typing, and reinterpreting casts for non-type-safe code. A
Non-memory transformations are also greatly simplified be- program without casts is necessarily type-safe (in the ab-
cause (unrelated to SSA) registers cannot have aliases. sence of memory access errors, e.g., array overflow [19]).
LLVM also makes the Control Flow Graph (CFG) of every A critical difficulty in preserving type information for
function explicit in the representation. A function is a set low-level code is implementing address arithmetic. The
of basic blocks, and each basic block is a sequence of LLVM getelementptr instruction is used by the LLVM system to
instructions, ending in exactly one terminator instruction perform pointer arithmetic in a way that both preserves type
(branches, return, unwind, or invoke; the latter two are information and has machine-independent semantics. Given
explained later below). Each terminator explicitly specifies a typed pointer to an object of some aggregate type, this in-
its successor basic blocks. struction calculates the address of a sub-element of the ob-
ject in a type-preserving manner (effectively a combined ‘.’
2.2 Language-independent Type Information, and ‘[ ]’ operator for LLVM). For example, the C statement
Cast, and GetElementPtr “X[i].a = 1;” could be translated into the pair of LLVM
One of the fundamental design features of LLVM is the in- instructions:
clusion of a language-independent type system. Every SSA %p = getelementptr %xty* %X, long %i, ubyte 3;
register and explicit memory object has an associated type, store int 1, int* %p;
and all operations obey strict type rules. This type informa- where we assume a is field number 3 within the structure
tion is used in conjunction with the instruction opcode to X[i], and the structure is of type %xty. Making all address
determine the exact semantics of an instruction (e.g. float- arithmetic explicit is important so that it is exposed to all
ing point vs. integer add). This type information enables a LLVM optimizations (most importantly, reassociation and
broad class of high-level transformations on low-level code redundancy elimination); getelementptr achieves this with-
3 out obscuring the type information. Load and store instruc-
For example, there are no unary operators: not and neg tions take a single pointer and do not perform any indexing,
are implemented in terms of xor and sub, respectively.
which makes the processing of memory accesses simple and the C++ exception model; in fact, both coexist cleanly in
uniform. our implementation [13]. At a call site, if some code must be
executed when an exception is thrown (for example, setjmp,
2.3 Explicit Memory Allocation and Unified “catch” blocks, or automatic variable destructors in C++),
Memory Model the code uses the invoke instruction for the call. When
LLVM provides instructions for typed memory allocation. an exception is thrown, this causes the stack unwinding to
The malloc instruction allocates one or more elements of stop in the current function, execute the desired code, then
a specific type on the heap, returning a typed pointer to continue execution or unwinding as appropriate.
the new memory. The free instruction releases memory al-
{
located through malloc4 . The alloca instruction is similar AClass Obj; // Has a destructor
to malloc except that it allocates memory in the stack frame func(); // Might throw; must execute destructor
of the current function instead of the heap, and the mem- ...
ory is automatically deallocated on return from the function. }
All stack-resident data (including “automatic” variables) are Figure 1: C++ exception handling example
allocated explicitly using alloca.
In LLVM, all addressable objects (“lvalues”) are explicitly For example, consider Figure 1, which shows a case where
allocated. Global variable and function definitions define a “cleanup code” needs to be generated by the C++ front-
symbol which provides the address of the object, not the end. If the ‘func()’ call throws an exception, C++ guaran-
object itself. This gives a unified memory model in which tees that the destructor for the Object object will be run.
all memory operations, including call instructions, occur To implement this, an invoke instruction is used to halt un-
through typed pointers. There are no implicit accesses to winding, the destructor is run, then unwinding is continued
memory, simplifying memory access analysis, and the rep- with the unwind instruction. The generated LLVM code is
resentation needs no “address of” operator. shown in Figure 2. Note that a front-end for Java would use
similar code to unlock locks that are acquired through syn-
2.4 Function Calls and Exception Handling chronized blocks or methods when exceptions are thrown.
For ordinary function calls, LLVM provides a call in- ...
; Allocate stack space for object:
struction that takes a typed function pointer (which may be
%Obj = alloca %AClass, uint 1
a function name or an actual pointer value) and typed ac- ; Construct object:
tual arguments. This abstracts away the calling conventions call void %AClass::AClass(%AClass* %Obj)
of the underlying machine and simplifies program analysis. ; Call ‘‘func()’’:
One of the most unusual features of LLVM is that it invoke void %func() to label %OkLabel
provides an explicit, low-level, machine-independent mech- unwind to label %ExceptionLabel
OkLabel:
anism to implement exception handling in high-level lan-
; ... execution continues...
guages. In fact, the same mechanism also supports setjmp ExceptionLabel:
and longjmp operations in C, allowing these operations to be ; If unwind occurs, excecution continues
analyzed and optimized in the same way that exception fea- ; here. First, destroy the object:
tures in other languages are. The common exception mech- call void %AClass::~AClass(%AClass* %Obj)
anism is based on two instructions, invoke and unwind. ; Next, continue unwinding:
unwind
The invoke and unwind instructions together support
an abstract exception handling model logically based on Figure 2: LLVM code for the C++ example. The handler
stack unwinding (though LLVM-to-native code generators code specified by invoke executes the destructor.
may use either “zero cost” table-driven methods [9] or
setjmp/longjmp to implement the instructions). invoke is A key feature of our approach is that the complex,
used to specify exception handling code that must be exe- language-specific details of what code must be executed to
cuted during stack unwinding for an exception. unwind is throw and recover from exceptions is isolated to the lan-
used to throw an exception or to perform a longjmp. We guage front-end and language-specific runtime library (so
first describe the mechanisms and then describe how they it does not complicate the LLVM representation), but yet
can be used for implementing exception handling. the exceptional control-flow due to stack unwinding is en-
The invoke instruction works just like a call, but speci- coded within the application code and therefore exposed in
fies an extra basic block that indicates the starting block for a language-indepenent manner to the optimizer. The C++
an unwind handler. When the program executes an unwind exception handling model is very complicated, supporting
instruction, it logically unwinds the stack until it removes many related features such as try/catch blocks, checked ex-
an activation record created by an invoke. It then transfers ception specifications, function try blocks, etc., and reqiring
control to the basic block specified by the invoke. These two complex semantics for the dynamic lifetime of an exception
instructions expose exceptional control flow in the LLVM object. The C++ front-end supports these semantics by
CFG. generating calls to a simple runtime library.
These two primitives can be used to implement a wide For example, consider the expression ‘throw 1’. This con-
variety of exception handling mechanisms. To date, we have structs and throws an exception with integer type. The
implemented full support for C’s setjmp/longjmp calls and generated LLVM code is shown in Figure 3. The example
code illustrates the key feature mentioned above. The run-
4
When native code is generated for a program, malloc and time handles all of the implementation-specific details, such
free instructions are converted to the appropriate native as allocating memory for exceptions5 . Second, the runtime
function calls, allowing custom memory allocators to be
5
used. For example, the implementation has to be careful to re-
; Allocate an exception object
%t1 = call sbyte* %__llvm_cxxeh_alloc_exc(uint 4) and the individual components are designed to achieve all
%t2 = cast sbyte* %t1 to int* these goals.
; Construct the thrown value into the memory
store int 1, int* %t2 3.1 High-Level Design of the LLVM Compiler
; ‘‘Throw’’ an integer expression, specifying the Framework
; exception object, the typeid for the object, and
; the destructor for the exception (null for int).
Figure 4 shows the high-level architecture of the LLVM
call void %__llvm_cxxeh_throw(sbyte* %t1, system. Briefly, static compiler front-ends emit code in the
<typeinfo for int>, LLVM representation, which is combined together by the
void (sbyte*)* null) LLVM linker. The linker performs a variety of link-time op-
unwind ; Unwind the stack. timizations, especially interprocedural ones. The resulting
LLVM code is then translated to native code for a given tar-
Figure 3: LLVM code uses a runtime library for C++ ex-
get at link-time or install-time, and the LLVM code is saved
ceptions support while exposing control-flow.
with the native code. (It is also possible to translate LLVM
code at runtime with a just-in-time translator.) The native
functions manipulate the thread-local state of the excep- code generator inserts light-weight instrumentation to de-
tion handling runtime, but don’t actually unwind the stack. tect frequently executed code regions (currently loop nests
Because the calling code performs the stack unwind, the op- and traces, but potentially also functions), and these can be
timizer has a better view of the control flow of the function optimized at runtime. The profile data collected at runtime
without having to perform interprocedural analysis. This represent the end-user’s (not the developer’s) runs, and can
allows LLVM to turn stack unwinding operations into direct be used by an offline optimizer to perform aggressive profile-
branches when the unwind target is the same function as the driven optimizations in the field during idle-time, tailored to
unwinder (this often occurs due to inlining, for example). the specific target machine.
Finally, try/catch blocks are implemented in a straight- This strategy provides five benefits that are not available
forward manner, using the same mechanisms and runtime in the traditional model of static compilation to native ma-
support. Any function call within the try block becomes an chine code. We argued in the Introduction that these capa-
invoke. Any throw within the try-block becomes a call to bilities are important for lifelong analysis and transforma-
the runtime library (as in the example above), followed by an tion, and we named them:
explicit branch to the appropriate catch block. The “catch
1. persistent program information,
block” then uses the C++ runtime library to determine if
the top-level current exception is of one of the types that is 2. offline code generation,
handled in the catch block. If so, it transfers control to the 3. user-based profiling and optimization,
appropriate block, otherwise it calls unwind to continue un- 4. transparent runtime model, and
winding. The runtime library handles the language-specific 5. uniform, whole-program compilation.
semantics of determining whether the current exception is These are difficult to obtain simultaneously for at least two
of a caught type. reasons. First, offline code generation (#2) normally does
2.5 Plain-text, Binary, and In-memory Repre- not allow optimization at later stages on the higher-level
sentations representation instead of native machine code (#1 and #3).
Second, lifelong compilation has traditionally been associ-
The LLVM representation is a first class language which ated only with bytecode-based languages, which do not pro-
defines equivalent textual, binary, and in-memory (i.e., com- vide #4 and often not #2 or #5.
piler’s internal) representations. The instruction set is de- In fact, we noted in the Introduction that no existing com-
signed to serve effectively both as a persistent, offline code pilation approach provides all the capabilities listed above.
representation and as a compiler internal representation, Our reasons are as follows:
with no semantic conversions needed between the two6 . Be-
ing able to convert LLVM code between these representa- • Traditional source-level compilers provide #2 and #4,
tions without information loss makes debugging transfor- but do not attempt #1, #3 or #5. They do pro-
mations much simpler, allows test cases to be written easily, vide interprocedural optimization, but require signifi-
and decreases the amount of time required to understand cant changes to application Makefiles.
the in-memory representation.
• Several commercial compilers provide the additional
benefit of #1 and #5 at link-time by exporting their
3. COMPILER ARCHITECTURE intermediate representation to object files [21, 5, 26]
The goal of the LLVM compiler framework is to enable and performing optimizations at link-time. No such
sophisticated transformations at link-time, install-time, run- system we know of is also capable of preserving its
time, and idle-time, by operating on the LLVM representa- representation for runtime or idle-time use (benefits
tion of a program at all stages. To be practical however, #1 and #3).
it must be transparent to application developers and end-
users, and it must be efficient enough for use with real-world • Higher-level virtual machines like JVM and CLI pro-
applications. This section describes how the overall system vide benefit #3 and partially provide #1 (in particu-
lar, they focus on runtime optimization, because the
serve space for throwing std::bad alloc exceptions.
6 need for bytecode verification greatly restricts the op-
In contrast, typical JVM implementations convert from the
stack-based bytecode language used offline to an appropriate timizations that may be done before runtime [3]). CLI
representation for compiler transformations, and some even partially provides #5 because it can support code in
convert to SSA form for this purpose (e.g., [8]). multiple languages, but any low-level system code and
exe &
Libraries LLVM
Offline Reoptimizer
exe &
LLVM LLVM
Compiler FE 1 LLVM Native exe Profile
. CPU Profile Info
LLVM
Linker CodeGen & Trace
. .o files
IPO/IPA LLVM
exe Info Runtime
Compiler FE N JIT LLVM Optimizer
LLVM LLVM

Figure 4: LLVM system architecture diagram

code in non-conforming languages is executed as “un-


managed code”. Such code is represented in native External static LLVM compilers (referred to as front-ends)
form and not in the CLI intermediate representation, translate source-language programs into the LLVM virtual
so it is not exposed to CLI optimizations. These sys- instruction set. Each static compiler can perform three key
tems do not provide #2 with #1 or #3 because run- tasks, of which the first and third are optional: (1) Perform
time optimization is generally only possible when us- language-specific optimizations, e.g., optimizing closures in
ing JIT code generation. They do not aim to provide languages with higher-order functions. (2) Translate source
#4, and instead provide a rich runtime framework for programs to LLVM code, synthesizing as much useful LLVM
languages that match their runtime and object model, type information as possible, especially to expose pointers,
e.g., Java and C#. Omniware [1] provides #5 and structures, and arrays. (3) Invoke LLVM passes for global
most of the benefits of #2 (because, like LLVM, it uses or interprocedural optimizations at the module level. The
a low-level represention that permits extensive static LLVM optimizations are built into libraries, making it easy
optimization), but at the cost of not providing infor- for front-ends to use them.
mation for high-level analysis and optimization (i.e., The front-end does not have to perform SSA construc-
#1). It does not aim to provide #3 or #4. tion. Instead, variables can be allocated on the stack (which
is not in SSA form), and the LLVM stack promotion and
• Transparent binary runtime optimization systems like scalar expansion passes can be used to build SSA form ef-
Dynamo and the runtime optimizers in Transmeta pro- fectively. Stack promotion converts stack-allocated scalar
cessors provide benefits #2, #4 and #5, but they do values to SSA registers if their address does not escape the
not provide #1. They provide benefit #3 only at run- current function, inserting φ functions as necessary to pre-
time, and only to a limited extent because they work serve SSA form. Scalar expansion precedes this and expands
only on native binary code, limiting the optimizations local structures to scalars wherever possible, so that their
they can perform. fields can be mapped to SSA registers as well.
• Profile Guided Optimization for static languages pro- Note that many “high-level” optimizations are not really
vide benefit #3 at the cost of not being transparent language-dependent, and are often special cases of more
(they require a multi-phase compilation process). Ad- general optimizations that may be performed on LLVM
ditionally, PGO suffers from three problems: (1) Em- code. For example, both virtual function resolution for
pirically, developers are unlikely to use PGO, except object-oriented languages (described in Section 4.1.2) and
when compiling benchmarks. (2) When PGO is used, tail-recursion elimination which is crucial for functional lan-
the application is tuned to the behavior of the train- guages can be done in LLVM. In such cases, it is better to
ing run. If the training run is not representative of the extend the LLVM optimizer to perform the transformation,
end-user’s usage patterns, performance may not im- rather than investing effort in code which only benefits a
prove and may even be hurt by the profile-driven opti- particular front-end. This also allows the optimizations to
mization. (3) The profiling information is completely be performed throughout the lifetime of the program.
static, meaning that the compiler cannot make use of
phase behavior in the program or adapt to changing 3.3 Linker & Interprocedural Optimizer
usage patterns. Link time is the first phase of the compilation process
where most7 of the program is available for analysis and
There are also significant limitations of the LLVM strat- transformation. As such, link-time is a natural place to
egy. First, language-specific optimizations must be per- perform aggressive interprocedural optimizations across the
formed in the front-end before generating LLVM code. entire program. The link-time optimizations in LLVM oper-
LLVM is not designed to represent source languages types ate on the LLVM representation directly, taking advantage
or features directly. Second, it is an open question whether of the semantic information it contains. LLVM currently
languages requiring sophisticated runtime systems such as includes a number of interprocedural analyses, such as a
Java can benefit directly from LLVM. We are currently ex- context-sensitive points-to analysis (Data Structure Anal-
ploring the potential benefits of implementing higher-level ysis [31]), call graph construction, and Mod/Ref analy-
virtual machines such as JVM or CLI on top of LLVM. sis, and interprocedural transformations like inlining, dead
The subsections below describe the key components of global elimination, dead argument elimination, dead type
the LLVM compiler architecture, emphasizing design and elimination, constant propagation, array bounds check elim-
implementation features that make the capabilities above ination [28], simple structure field reordering, and Auto-
practical and efficient.
7
Note that shared libraries and system libraries may not
3.2 Compile-Time: External front-end & static be available for analysis at link time, or may be compiled
optimizer directly to native code.
matic Pool Allocation [30]. We believe these three characteristics together represent
The design of the compile- and link-time optimizers in one “optimal” design point for a runtime optimizer because
LLVM permit the use of a well-known technique for speed- they allow the best choice in three key aspects: high-quality
ing up interprocedural analysis. At compile-time, interpro- initial code generation (offline rather than online), coopera-
cedural summaries can be computed for each function in the tive support from the code-generator, and the ability to per-
program and attached to the LLVM bytecode. The link- form sophisticated analyses and optimizations (using LLVM
time interprocedural optimizer can then process these inter- rather than native code as the input).
procedural summaries as input instead of having to com-
pute results from scratch. This technique can dramatically 3.6 Offline Reoptimization with End-user Pro-
speed up incremental compilation when a small number of file Information
translation units are modified [7]. Note that this is achieved Because the LLVM representation is preserved perma-
without building a program database or deferring the com- nently, it enables transparent offline optimization of appli-
pilation of the input source code until link-time. cations during idle-time on an end-user’s system. Such an
optimizer is simply a modified version of the link-time inter-
3.4 Offline or JIT Native Code Generation procedural optimizer, but with a greater emphasis on profile-
Before execution, a code generator is used to translate driven and target-specific optimizations.
from LLVM to native code for the target platform (we cur- An offline, idle-time reoptimizer has several key benefits.
rently support the Sparc V9 and x86 architectures), in one First, as noted earlier, unlike traditional profile-guided op-
of two ways. In the first option, the code generator is run timizers (i.e., compile-time or link-time ones), it can use
statically at link time or install time, to generate high per- profile information gathered from end-user runs of the ap-
formance native code for the application, using possibly ex- plication. It can even reoptimize an application multiple
pensive code generation techniques. If the user decides to times in response to changing usage patterns over time (or
use the post-link (runtime and offline) optimizers, a copy optimize differently for users with differing patterns). Sec-
of the LLVM bytecode for the program is included into the ond, it can tailor the code to detailed features of a single
executable itself. In addition, the code generator inserts target machine, whereas traditional binary distributions of
light-weight instrumentation into the program to identify code must often be run on many different machine config-
frequently executed regions of code. urations with compatible architectures and operating sys-
Alternatively, a just-in-time Execution Engine can be used tems. Third, unlike the runtime optimizer (which has both
which invokes the appropriate code generator at runtime, the previous benefits), it can perform much more aggressive
translating one function at a time for execution (or uses optimizations because it is run offline.
the portable LLVM interpreter if no native code generator Nevertheless, runtime optimization can further improve
is available). The JIT translator can also insert the same performance because of the ability to perform optimiza-
instrumentation as the offline code generator. tions based on runtime values as well as path-sensitive opti-
mizations (which can cause significant code growth if done
3.5 Runtime Path Profiling & Reoptimization aggressively offline), and to adaptively optimize code for
One of the goals of the LLVM project is to develop a new changing execution behavior within a run. For dynamic,
strategy for runtime optimization of ordinary applications. long-running applications, therefore, the runtime and offline
Although that work is outside the scope if this paper, we reoptimizers could coordinate to ensure the highest achiev-
briefly describe the strategy and its key benefits. able performance.
As a program executes, the most frequently executed ex-
ecution paths are identified through a combination of of- 4. APPLICATIONS AND EXPERIENCES
fline and online instrumentation [39]. The offline instru- Sections 2 and 3 describe the design of the LLVM code
mentation (inserted by the native code generator) identifies representation and compiler architecture. In this section,
frequently executed loop regions in the code. When a hot we evaluate this design in terms of three categories of issues:
loop region is detected at runtime, a runtime instrumenta- (a) the characteristics of the representation; (b) the speed of
tion library instruments the executing native code to iden- performing whole-program analyses and transformations in
tify frequently-executed paths within that region. Once hot the compiler; and (c) illustrative uses of the LLVM system
paths are identified, we duplicate the original LLVM code for challenging compiler problems, focusing on how the novel
into a trace, perform LLVM optimizations on it, and then capabilities in LLVM benefit these uses.
regenerate native code into a software-managed trace cache.
We then insert branches between the original code and the 4.1 Representation Issues
new native code. We evaluate three important characteristics of the LLVM
The strategy described here is powerful because it com- representation. First, a key aspect of the representation is
bines the following three characteristics: (a) Native code the language-independent type system. Does this type sys-
generation can be performed ahead-of-time using sophisti- tem provide any useful information when it can be violated
cated algorithms to generate high-performance code. (b) with casts? Second, how do high-level language features
The native code generator and the runtime optimizer can map onto the LLVM type system and code representation?
work together since they are both part of the LLVM frame- Third, how large is the LLVM representation when written
work, allowing the runtime optimizer to exploit support to disk?
from the code generator (e.g., for instrumentation and sim-
plifying transformations). (c) The runtime optimizer can 4.1.1 What value does type information provide?
use high-level information from the LLVM representation to Reliable type information about programs can enable the
perform sophisticated runtime optimizations. optimizer to perform aggressive transformations that would
be difficult otherwise, such as reordering two fields of a tion. Intuitively, checking that declared types are respected
structure or optimizing memory management [12, 30]. As is much easier than inferring those types, for structure and
noted in Section 2.2, however, declared type information in array types in a low-level code representation. As an exam-
LLVM is not reliable and some analysis (typically including ple, an earlier version of the LLVM C front-end was based
a pointer analysis) must check the declared type informa- on GCC’s RTL internal representation, which provided lit-
tion before it can be used. A key question is how much tle useful type information, and both DSA and pool alloca-
reliable type information is available in programs compiled tion were much less effective. Our new C/C++ front-end
to LLVM? is based on the GCC Abstract Syntax Tree representation,
LLVM includes a flow-insensitive, field-sensitive and context- which makes much more type information available.
sensitive points-to analysis called Data Structure Analysis
(DSA) [31]. Several transformations in LLVM are based on 4.1.2 How do high-level features map onto LLVM?
DSA, including Automatic Pool Allocation [30]). As part of Compared to source languages, LLVM is a much lower
the analysis, DSA extracts LLVM types for a subset of mem- level representation. Even C, which itself is quite low-level,
ory objects in the program. It does this by using declared has many features which must be lowered by a compiler
types in the LLVM code as speculative type information, and targeting LLVM. For example, complex numbers, struc-
checks conservatively whether memory accesses to an object ture copies, unions, bit-fields, variable sized arrays, and
are consistent with those declared types8 (note that it does setjmp/longjmp all must be lowered by an LLVM C com-
not perform any type-inference or enforce type safety). piler. In order for the representation to support effective
For a wide range of benchmarks, we measured the fraction analyses and transformations, the mapping from source-
of static load and store operations for which reliable type language features to LLVM should capture the high-level
information about the accessed objects is available using operational behavior as cleanly as possible.
DSA. Table 1 shows this statistic for the C benchmarks in We discuss this issue by using C++ as an example, since
SPEC CPU2000. Benchmarks written in a more disciplined it is the richest language for which we have an implemented
style, (e.g., the Olden and Ptrdist benchmarks) had nearly front-end. We believe that all the complex, high-level fea-
perfect results, scoring close to 100% in most cases. tures of C++ are expressed clearly in LLVM, allowing their
behavior to be effectively analyzed and optimized:
Benchmark Typed Untyped Typed
Name Accesses Accesses Percent • Implicit calls (e.g. copy constructors) and parameters
164.gzip 1654 61 96.4% (e.g. ‘this’ pointers) are made explicit.
175.vpr 4038 371 91.6%
176.gcc 25747 33179 43.7% • Templates are fully instantiated by the C++ front
177.mesa 2811 19668 12.5% end before LLVM code is generated. (True poly-
179.art 572 0 100.0%
181.mcf 571 0 100.0% morphic types in other languages would be expanded
183.equake 799 114 87.5% into equivalent code using non-polymorphic types in
186.crafty 9734 383 96.2% LLVM.)
188.ammp 2109 2598 44.8%
197.parser 1577 2257 41.1% • Base classes are expanded into nested structure types.
253.perlbmk 9678 22302 30.3% For this C++ fragment:
254.gap 6432 15117 29.8%
255.vortex 13397 8915 60.0% class base1 { int Y; };
256.bzip2 1011 52 95.1% class base2 { float X; };
300.twolf 13028 1196 91.6% class derived : base1, base2 { short Z; };
average 68.04%
the LLVM type for class derived is ‘{ {int}, {float},
Table 1: Loads and Stores which are provably typed short }’. If the classes have virtual functions, a v-
table pointer would also be included and initialized at
The table shows that many of these programs (164, 175, object allocation time to point to the virtual function
179, 181, 183, 186, 256, & 300) have a surprisingly high pro- table, described below.
portion of memory accesses with reliable type information,
despite using a language that does not encourage disciplined • A virtual function table is represented as a global, con-
use of types. The leading cause of loss of type information in stant array of typed function pointers, plus the type-id
the remaining programs is the use of custom memory alloca- object for the class. With this representation, virtual
tors (in 197, 254, & 255), inherently non-type-safe program method call resolution can be performed by the LLVM
constructs such as using different structure types for the optimizer as effectively as by a typical source compiler
same objects in different places (176, 253 & 254) and impre- (more effectively if the source compiler uses only per-
cision due to DSA (in 177 & 188). Overall, despite the use module instead of cross-module pointer analysis).
of custom allocators, casting to and from void*, and other
C tricks, DSA is still able to verify the type information for • C++ exceptions are lowered to the ‘invoke’ and
an average of 68% of accesses across these programs. ‘unwind’ instructions as described in Section 2.4, ex-
It is important to note that similar results would be very posing exceptional control flow in the CFG. In fact,
difficult to obtain if LLVM had been an untyped representa- having this information available at link time enables
8 LLVM to use an interprocedural analysis to eliminate
DSA is actually quite aggressive: it can often extract
type information for objects stored into and loaded out of unused exception handlers. This optimization is much
“generic” void* data structure, despite the casts to and from less effective if done on a per-module basis in a source-
void*. level compiler.
We believe that similarly clean LLVM implementations shows the table of runtimes for several interprocedural op-
exist for most constructs in other language families like timizations. All timings were collected on a 3.06GHz Intel
Scheme, the ML family, SmallTalk, Java and Microsoft CLI. Xeon processor. The LLVM compiler system was compiled
We aim to explore these issues in the future, and prelimi- using the GCC 3.3 compiler at optimization level -O3.
nary work is underway on the implementation of JVM and
OCaml front-ends. Benchmark DGE DAE inline GCC
164.gzip 0.0018 0.0063 0.0127 1.937
4.1.3 How compact is the LLVM representation? 175.vpr 0.0096 0.0082 0.0564 5.804
176.gcc 0.0496 0.1058 0.6455 55.436
Since code for the compiled program is stored in the 177.mesa 0.0051 0.0312 0.0788 20.844
LLVM representation throughout its lifetime, it is impor- 179.art 0.0002 0.0007 0.0085 0.591
tant that it not be too large. The flat, three-address form of 181.mcf 0.0010 0.0007 0.0174 1.193
LLVM is well suited for a simple linear layout, with most in- 183.equake 0.0000 0.0009 0.0100 0.632
structions requiring only a single 32-bit word each in the file. 186.crafty 0.0016 0.0162 0.0531 9.444
Figure 5 shows the size of LLVM files for SPEC CPU2000 188.ammp 0.0200 0.0072 0.1085 5.663
197.parser 0.0021 0.0096 0.0516 5.593
executables after linking, compared to native X86 and 32- 253.perlbmk 0.0137 0.0439 0.8861 25.644
bit Sparc executables compiled by GCC 3.3 at optimization 254.gap 0.0065 0.0384 0.1317 18.250
level -O3. 255.vortex 0.1081 0.0539 0.2462 20.621
256.bzip2 0.0015 0.0028 0.0122 1.520
2500
300.twolf 0.0712 0.0152 0.1742 11.986
2250 Table 2: Interprocedural optimization timings (in seconds)
2000
The table includes numbers for several transformations:
1750 DGE (aggressive9 Dead Global variable and function Elim-
1500
ination), DAE (aggressive Dead Argument and return value
Elimination), and inline (a function integration pass). All
1250 these interprocedural optimizations work on the whole pro-
LLVM
1000 gram at link-time. In addition, they spend most of their time
X86
Sparc
traversing and modifying the code representation directly,
750
so they reflect the costs of processing the representation. 10
500 As a reference for comparison, the GCC column indicates
the total time the GCC 3.3 compiler takes to compile the
250
program at -O3.
0 We find that in all cases, the optimization time is sub-
164

175

176

177

179

181

183

186

188

197

253

254

255

256

300

Avg

stantially less than that to compile the program with GCC,


despite the fact that GCC does no cross module optimiza-
Figure 5: Executable sizes for LLVM, X86, Sparc (in KB)
tion, and very little interprocedural optimization within a
translation unit. In addition, the interprocedural optimiza-
The figure shows that LLVM code is about the same size tions scale mostly linear with the number of transformations
as native X86 executables (a denser, variable-size instruction they perform. For example, DGE eliminates 331 functions
set), and significantly smaller than SPARC (a traditional and 557 global variables (which include string constants)
32-bit instruction RISC machine). We believe this is a very from 255.vortex, DAE eliminates 103 arguments and 96 re-
good result given that LLVM encodes an infinite register set, turn values from 176.gcc, and ‘inline’ inlines 1368 functions
rich type information, control flow information, and data- (deleting 438 which are no longer referenced) in 176.gcc.
flow (SSA) information that native executables do not.
Currently, large programs are encoded less efficiently than 4.2 Applications using life-time analysis and
smaller ones because they have a larger set of register values optimization capabilities of LLVM
available at any point, making it harder to fit instructions Finally, to illustrate the capabilities provided by the com-
into a 32-bit encoding. When an instruction does not fit piler framework, we briefly describe three examples of how
into a 32-bit encoding, LLVM falls back on a 64-bit or larger LLVM has been used for widely varying compiler problems,
encoding, as needed. Though it would be possible to make emphasizing some of the novel capabilities described in the
the fall back case more efficient, we have not attempted to introduction.
do so. Also, as with native executables, general purpose file
compression tools (e.g. bzip2) are able to reduce the size 4.2.1 Projects using LLVM as a general compiler
of bytecode files to about 50% of their uncompressed size, infrastructure
indicating substantial margin for improvement. As noted earlier, we have implemented several compiler
4.1.4 How fast is LLVM? techniques in LLVM. The most aggressive of these are
9
An important aspect of LLVM is that the low-level rep- “Aggressive” DCEs assume objects are dead until proven
resentation enables efficient analysis and transformation, otherwise, allowing dead objects with cycles to be deleted.
10
because of the small, uniform instruction set, the explicit DSA (Data Structure Analysis) is a much more complex
analysis, and it spends a negligible fraction of its time pro-
CFG and SSA representations, and careful implementation
cessing the code representation itself, so its run times are not
of data structures. This speed is important for uses “late” indicative of the efficiency of the representation. It is inter-
in the compilation process (i.e., at link-time or run-time). esting to note, however, that those times also are relatively
In order to provide a sense for the speed of LLVM, Table 2 fast compared with GCC compile times [31].
Data Structure Analysis (DSA) and Automatic Pool Allo- tion set representation, and extends it to be suitable as a
cation [30], which analyze and transform programs in terms V-ISA for hardware. The fundamental benefit of LLVM
of their logical data structures. These techniques inherit for this work is that the LLVM code representation is low-
a few significant benefits from LLVM, especially, (a) these level enough to represent arbitrary external software (in-
techniques are only effective if most of the program is avail- cluding operating system code), yet provides rich enough
able, i.e., at link-time; (b) type information is crucial for information to support sophisticated compiler techniques in
their effectiveness, especially pointers and structures; (c) the the translator. A second key benefit is the ability to do both
techniques are source-language independent; and (d) SSA offline and online translation, which is exploited by the OS-
significantly improves the precision of DSA, which is flow- independent translation strategy.
insensitive.
Other researchers not affiliated with our group have been 5. RELATED WORK
actively using or exploring the use of the LLVM compiler
We focus on comparing LLVM with three classes of pre-
framework, in a number of different ways. These include
vious work: other virtual-machine-based compiler systems,
using LLVM as an intermediate representation for binary-
research on typed assembly languages, and link-time or dy-
to-binary transformations, as a compiler back-end to sup-
namic optimization systems.
port a hardware-based trace cache and optimization system,
As noted in the introduction, the goals of LLVM are com-
as a basis for runtime optimization and adaptation of Grid
plementary to those of higher-level language virtual ma-
programs, and as an implementation platform for a novel
chines such as SmallTalk, Self, JVM, and the managed mode
programming language.
of Microsoft CLI. High-level virtual machines such as these
require a particular object model and runtime system for
4.2.2 SAFECode: A safe low-level representation use. This implies that they can provide higher-level type
and execution environment information about the program, but are not able to sup-
SAFECode is a “safe” code representation and execution port languages that do not match their design (even object-
environment, based on a type-safe subset of LLVM. The goal oriented languages such as C++). Additionally, programs in
of the work is to enforce memory safety of programs in the these representations (except CLI) are required to be type-
SAFECode representation through static analysis, by using safe. This is important for supporting mobile code, but
a variant of automatic pool allocation instead of garbage col- makes these virtual machines insufficient for non-type-safe
lection [19], and using extensive interprocedural static anal- languages and for low-level system code. It also significantly
ysis to minimize runtime checks [28, 19]. limits the amount of optimization that can be done before
The SAFECode system exploits nearly all capabilities of runtime because of the need for bytecode verification.
the LLVM framework, except runtime optimization. It di- The Microsoft CLI virtual machine has a number of fea-
rectly uses the LLVM code representation, which provides tures that distinguish it from other high-level virtual ma-
the ability to analyze C and C++ programs, which is crucial chines, including explicit support for a wide range of features
for supporting embedded software, middle-ware, and sys- from multiple languages, language interoperability support,
tem libraries. SAFECode relies on the type information in non-type-safe code, and “unmanaged” execution mode. Un-
LLVM (with no syntactic changes) to check and enforce type managed mode allows CLI to represent code in arbitrary lan-
safety. It relies on the array type information in LLVM to guages, including those that do not conform to its type sys-
enforce array bounds safety, and uses interprocedural anal- tem or runtime framework, e.g., ANSI-standard C++ [34].
ysis to eliminate runtime bounds checks in many cases [28]. However, code in unmanaged mode is not represented in
It uses interprocedural safety checking techniques, exploit- the CLI intermediate representation (MSIL), and therefore
ing the link-time framework to retain the benefits of separate is not subject to dynamic optimization in CLI. In con-
compilation (a key difficulty that led previous such systems trast, LLVM allows code from arbitrary languages to be
to avoid using interprocedural techniques [17, 23]). represented in a uniform, rich representation and optimized
throughout the lifetime of the code. A second key difference
4.2.3 External ISA design for Virtual Instruction Set is that LLVM lacks the interoperability features of CLI but
Computers also does not require source-languages to match the runtime
Virtual Instruction Set Computers [40, 16, 2] are proces- and object model for interoperability. Instead, it requires
sor designs that use two distinct instruction sets: an exter- source-language compilers to manage interoperability, but
nally visible, virtual instruction set (V-ISA) which serves then allows all such code to be exposed to LLVM optimizers
as the program representation for all software, and a hid- at all stages.
den implementation-specific instruction set (I-ISA) that is The Omniware virtual machine [1] is closer to LLVM, be-
the actual hardware ISA. A software translator co-designed cause they use an abstract low-level RISC architecture and
with the hardware translates V-ISA code to the I-ISA trans- can support arbitrary code (including non-type-safe code)
parently for execution, and is the only software that is aware from any source language. However, the Omniware instruc-
of the I-ISA. This translator is essentially a sophisticated, tion set lacks the higher-level type information of LLVM.
implementation-specific back-end compiler. In fact, it allows (and requires) source compilers to choose
In recent work, we argued that an extended version of the data layouts, perform address arithmetic, and perform reg-
LLVM instruction set could be a good choice for the external ister allocation (to a small set of virtual registers). All these
V-ISA for such processor designs [2]. We proposed a novel features make it difficult to perform any sophisticated analy-
implementation strategy for the virtual-to-native translator sis on the resulting Omniware code. These differences from
that enables offline code translation and caching of trans- LLVM arise because the goals of their work are primarily
lated code in a completely OS-independent manner. to provide code mobility and safety, not a basis for lifelong
That work exploits the important features of the instruc- code optimization. Their virtual machine compiles Omni-
ware code to native code at runtime, and performs only optimizations, with or without profile information.
relatively simple optimizations plus some stronger machine-
dependent optimizations. 6. CONCLUSION
Kistler and Franz describe a compilation architecture for
This paper has described LLVM, a system for performing
performing optimization in the field, using simple initial
lifelong code analysis and transformation, while remaining
load-time code generation, followed by profile-guided run-
transparent to programmers. The system uses a low-level,
time optimization [27]. Their system targets the Oberon
typed, SSA-based instruction set as the persistent represen-
language, uses Slim Binaries [22] as its code representation,
tation of a program, but without imposing a specific run-
and provides type safety and memory management similar
time environment. The LLVM representation is language
to other high-level virtual machines. They do not attempt
independent, allowing all the code for a program, including
to support arbitrary languages or to use a transparent run-
system libraries and portions written in different languages,
time system, as LLVM does. They also do not propose doing
to be compiled and optimized together. The LLVM com-
static or link-time optimization.
piler framework is designed to permit optimization at all
There has been a wide range of work on typed intermedi-
stages of a software lifetime, including extensive static op-
ate representations. Functional languages often use strongly
timization, online optimization using information from the
typed intermediate languages (e.g. [38]) as a natural exten-
LLVM code, and idle-time optimization using profile infor-
sion of the source language. Projects on typed assembly lan-
mation gathered from programmers in the field. The current
guages (e.g., TAL [35] and LTAL [10]) focus on preserving
implementation includes a powerful link-time global and in-
high-level type information and type safety during compi-
terprocedural optimizer, a low-overhead tracing technique
lation and optimizations. The SafeTSA [3] representation
for runtime optimization, and Just-In-Time and static code
is a combination of type information with SSA form, which
generators.
aims to provide a safe but more efficient representation than
We showed experimentally and based on experience that
JVM bytecode for Java programs. In contrast, the LLVM
LLVM makes available extensive type information even for
virtual instruction set does not attempt to preserve type
C programs, which can be used to safely perform a number
safety of high-level languages, to capture high-level type in-
of aggressive transformations that would normally be at-
formation from such languages, or to enforce code safety
tempted only on type-safe languages in source-level compil-
directly (though it can be used to do so [19]). Instead, the
ers. We also showed that the LLVM representation is com-
goal of LLVM is to enable sophisticated analyses and trans-
parable in size to X86 machine code and about 25% smaller
formations beyond static compile time.
than SPARC code on average, despite capturing much richer
There have been attempts to define a unified, generic, in-
type information as well as an infinite register set in SSA
termediate representation. These have largely failed, rang-
form. Finally, we gave several examples of whole-program
ing from the original UNiversal Computer Oriented Lan-
optimizations that are very efficient to perform on the LLVM
guage [42] (UNCOL), which was discussed but never im-
representation. A key question we are exploring currently is
plemented, to the more recent Architecture and language
whether high-level language virtual machines can be imple-
Neutral Distribution Format [4] (ANDF), which was im-
mented effectively on top of the LLVM runtime optimization
plemented but has seen limited use. These unified repre-
and code generation framework.
sentations attempt to describe programs at the AST level,
by including features from all supported source languages.
LLVM is much less ambitious and is more like an assembly 7. REFERENCES
language: it uses a small set of types and low-level opera- [1] A.-R. Adl-Tabatabai, G. Langdale, S. Lucco, and
tions, and the “implementation” of high-level language fea- R. Wahbe. Efficient and language-independent mobile
tures is described in terms of these types. In some ways, programs. In Proc. ACM SIGPLAN 1996 Conference
LLVM simply appears as a strict RISC architecture. on Programming Language Design and
Several systems perform interprocedural optimization at Implementation, pages 127–136. ACM Press, 1996.
link-time. Some operate on assembly code for a given [2] V. Adve, C. Lattner, M. Brukman, A. Shukla, and
processor [36, 41, 14, 37] (focusing primarily on machine- B. Gaeke. LLVA: A Low-level Virtual Instruction Set
dependent optimizations), while others export additional in- Architecture. In 36th Int’l Symp. on Microarchitecture,
formation from the static compiler, either in the form of an pages 205–216, San Diego, CA, Dec 2003.
IR or annotations [44, 21, 5, 26]. None of these approaches [3] W. Amme, N. Dalton, J. von Ronne, and M. Franz.
attempt to support optimization at runtime or offline after SafeTSA: A type safe and referentially secure
software is installed in the field, and it would be difficult to mobile-code representation based on static single
directly extend them to do so. assignment form. In PLDI, June 2001.
There have also been several systems that perform trans- [4] ANDF Consortium. The Architectural Neutral
parent runtime optimization of native code [6, 20, 16]. These Distribution Format. http://www.andf.org/.
systems inherit all the challenges of optimizing machine- [5] A. Ayers, S. de Jong, J. Peyton, and R. Schooler.
level code [36] in addition to the constraint of operating Scalable cross-module optimization. ACM SIGPLAN
under the tight time constraints of runtime optimization. Notices, 33(5):301–312, 1998.
In contrast, LLVM aims to provide type, dataflow (SSA) [6] V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: A
information, and an explicit CFG for use by runtime opti- transparent dynamic optimization system. In PLDI,
mizations. For example, our online tracing framework (Sec- pages 1–12, June 2000.
tion 3.5) directly exploits the CFG at runtime to perform
[7] M. Burke and L. Torczon. Interprocedural
limited instrumentation of hot loop regions. Finally, none
optimization: eliminating unnecessary recompilation.
of these systems supports link-time, install-time, or offline
Trans. Prog. Lang. and Sys, 15(3):367–399, 1993.
[8] M. G. Burke et al. The Jalapeño Dynamic Optimizing [27] T. Kistler and M. Franz. Continuous program
Compiler for Java. In Java Grande, pages 129–141, optimization: A case study. ACM Trans. on Prog.
1999. Lang. and Sys., 25(4):500–548, Jul 2003.
[9] D. Chase. Implementation of exception handling. The [28] S. Kowshik, D. Dhurjati, and V. Adve. Ensuring code
Journal of C Language Translation, 5(4):229–240, safety without runtime checks for real-time control
June 1994. systems. In Compilers, Architecture and Synthesis for
[10] J. Chen, D. Wu, A. W. Appel, and H. Fang. A Embedded Systems (CASES), Grenoble, Oct 2002.
provably sound TAL for back-end optimization. In [29] C. Lattner and V. Adve. LLVM Language Reference
PLDI, San Diego, CA, Jun 2003. Manual.
[11] A. Chernoff, et al. FX!32: A profile-directed binary http://llvm.cs.uiuc.edu/docs/LangRef.html.
translator. IEEE Micro, 18(2):56–64, 1998. [30] C. Lattner and V. Adve. Automatic Pool Allocation
[12] T. M. Chilimbi, B. Davidson, and J. R. Larus. for Disjoint Data Structures. In Proc. ACM SIGPLAN
Cache-conscious structure definition. In ACM Symp. Workshop on Memory System Performance, Berlin,
on Prog. Lang. Design and Implemenation, Atlanta, Germany, Jun 2002.
GA, May 1999. [31] C. Lattner and V. Adve. Data Structure Analysis: A
[13] CodeSourcery, Compaq, et al. C++ ABI for Itanium. Fast and Scalable Context-Sensitive Heap Analysis.
http://www.codesourcery.com/cxx-abi/abi.html, Tech. Report UIUCDCS-R-2003-2340, Computer
2001. Science Dept., Univ. of Illinois at Urbana-Champaign,
[14] R. Cohn, D. Goodwin, and P. Lowney. Optimizing Apr 2003.
Alpha executables on Windows NT with Spike. Digital [32] T. Lindholm and F. Yellin. The Java Virtual Machine
Technical Journal, 9(4), 1997. Specification. Addison-Wesley, Reading, MA, 1997.
[15] R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, [33] E. Meijer and J. Gough. A technical overview of the
and F. K. Zadeck. Efficiently computing static single Commmon Language Infrastructure, 2002.
assignment form and the control dependence graph. http://research.microsoft.com/~ emeijer/
Trans. Prog. Lang. and Sys., pages 13(4):451–490, Papers/CLR.pdf.
October 1991. [34] Microsoft Corp. Managed extensions for c++
[16] J. C. Dehnert, et al. The Transmeta Code Morphing specification. .NET Framework Compiler and
Software: Using speculation, recovery and adaptive Language Reference.
retranslation to address real-life challenges. In 1st [35] G. Morrisett, D. Walker, K. Crary, and N. Glew. From
IEEE/ACM Symp. Code Generation and System F to typed assembly language. Trans. Prog.
Optimization, San Francisco, CA, Mar 2003. Lang. and Systems, 21(3):528–569, May 1999.
[17] R. DeLine and M. Fahndrich. Enforcing high-level [36] R. Muth. Alto: A Platform for Object Code
protocols in low-level software. In PLDI, Snowbird, Modification. Ph.d. Thesis, Department of Computer
UT, June 2001. Science, University of Arizona, 1999.
[18] L. P. Deutsch and A. M. Schiffman. Efficient [37] T. Romer, G. Voelker, D. Lee, A. Wolman, W. Wong,
implementation of the smalltalk-80 system. In 11th H. Levy, B. Bershad, and B. Chen. Instrumentation
Symp. on Principles of Programming Languages, and optimization of Win32/Intel executables using
pages 297–301, Jan 1984. Etch. In Proc. USENIX Windows NT Workshop,
[19] D. Dhurjati, S. Kowshik, V. Adve, and C. Lattner. August 1997.
Memory safety without runtime checks or garbage [38] Z. Shao, C. League, and S. Monnier. Implementing
collection. In Languages, Compilers, and Tools for Typed Intermediate Languages. In Int’l Conf. on
Embedded Systems (LCTES), San Diego, Jun 2003. Functional Prog., pages 313–323, 1998.
[20] K. Ebcioglu and E. R. Altman. DAISY: Dynamic [39] A. Shukla. Lightweight, cross-procedure tracing for
compilation for 100% architectural compatibility. In runtime optimization. Master’s thesis, Comp. Sci.
ISCA, pages 26–37, 1997. Dept., Univ. of Illinois at Urbana-Champ aign,
[21] M. F. Fernández. Simple and effective link-time Urbana, IL, Aug 2003.
optimization of Modula-3 programs. ACM SIGPLAN [40] J. E. Smith, T. Heil, S. Sastry, and T. Bezenek.
Notices, 30(6):103–115, 1995. Achieving high performance via co-designed virtual
[22] M. Franz and T. Kistler. Slim binaries. machines. In Int’l Workshop on Innovative
Communications of the ACM, 40(12), 1997. Architecture (IWIA), 1999.
[23] D. Grossman, G. Morrisett, T. Jim, M. Hicks, [41] A. Srivastava and D. W. Wall. A practical system for
Y. Wang, and J. Cheney. Region-based memory intermodule code optimization at link-time. Journal of
management in cyclone. In PLDI, Berlin, Germany, Programming Languages, 1(1):1–18, Dec. 1992.
June 2002. [42] T. Steel. Uncol: The myth and the fact. Annual
[24] D. L. Heine and M. S. Lam. A practical flow-sensitive Review in Automated Programming 2, 1961.
and context-sensitive c and c++ memory leak [43] D. Ungar and R. B. Smith. Self: The power of
detector. In PLDI, pages 168–181, 2003. simplicity. In OOPSLA, 1987.
[25] M. Hind. Which pointer analysis should i use? In Int’l [44] D. Wall. Global register allocation at link-time. In
Symp. on Software Testing and Analysis, 2000. Proc. SIGPLAN ’86 Symposium on Compiler
[26] IBM Corp. XL FORTRAN: Eight Ways to Boost Construction, Palo Alto, CA, 1986.
Performance. White Paper, 2000.

You might also like