Rust's incremental compiler architecture

By Daroc Alden
December 3, 2024

The traditional structure of a compiler forms a pipeline — parsing, type-checking, optimization, and code-generation, usually in that order. But modern programming languages have requirements that are ill-suited to such a design. Increasingly, compilers are moving toward other designs in order to support incremental compilation and low-latency responses for uses like integration into IDEs. Rust has, for the last eight years, been pursuing a particularly unusual design; in that time compile times have substantially improved, but there's still more work to be done.

In a traditional compiler, normally each step of the pipeline needs to complete before the next is started. For example, optimization must complete before the compiler can begin emitting machine code. There are ways to improve this — working on translation units in parallel, caching some kinds of analysis, etc. — but, at a certain point, the architectures of the compilers themselves make it difficult to make the code support incremental compilation.

Language design can influence how much of a problem this is in practice. For example, while the GCC developers did look into making the compiler incremental, the fact that C defines each file as a stand-alone translation unit makes it easier to add incrementalism at the level of the build system. Many modern languages don't observe the same separation between files, however. In Rust, an entire crate is a single translation unit; cargo, Rust's build system, handles only recompiling changed crates, but adding incrementalism within a crate requires special support from the compiler.

So, in response to wishes for faster compile times, Rust has been exploring an alternative to the traditional structure of a compiler by switching to a query-based model: instead of having a fixed pipeline, the compiler has a set of queries for different properties of the program. Each query can either retrieve the answer from a persistent compilation database, or, if the answer has not yet been cached, calculate it from scratch by making other queries and combining their results. For example, the compiler uses the type_of() query to infer the types of local bindings. The query takes the ID of a local definition, and then calls other queries to determine whether it is an impl Trait type, what the abstract syntax tree of the definition is, what the types of the variables it refers to are, and so on.

Suppose that the programmer changes the type of a constant in their crate. When they run the compiler, it will evaluate a top-level query to produce an output binary. That query will in turn query for the final versions of each function in the crate (after type-checking and optimization). Most of those will not have changed, so the answers are provided from the on-disk compilation database. The queries for functions that depend on the changed constant, however, will get re-run — and will, themselves, call other queries until the whole process ends up at the query corresponding to the parsed representation of the single changed definition. Only the minimum amount of analysis and compilation needed to accommodate that change is performed.

This approach is reminiscent of a build system — and that's deliberate. Build systems, while not always perfect, usually do a good job of tracking the dependencies between components and correctly caching partial results. A compiler that has the architecture of a build system can take advantage of the research that has been done on build systems to do the same.

Rust's approach

The Rust project began the process of rewriting its compiler to support incremental compilation in 2016. While that redesign is not yet fully complete, large portions of the compiler now use a query-based approach. LWN recently covered Sparrow Li's talk that discussed what remains to be done.

The Rust compiler organizes queries using a structure called TyCtxt. When the compiler starts up, the various modules all register functions called "providers" that implement each query. Then, depending on how the compiler was invoked, it parses the code (because the parser has not yet been integrated into the query system) and runs a set of top-level queries to compile an artifact, format the code, produce warnings, or whatever else was requested.

Providers need to be pure functions, since the compiler may not always need to run them, so the programmer can't depend on any side effects. This also has the effect of forcing those parts of the compiler that have been converted to the query system to avoid global state. The results of queries could in theory be any type, but in practice the compiler requires that they be values that can be cheaply copied, in order to simplify the ownership of cached values. Not every query is saved to disk, but most queries return types that can be serialized and hashed, so that they can be.

There are some important differences in structure between Rust's queries and static build systems like Make, not just in the form that their results take. In Rust's implementation, the relationship between queries is implicit. Queries are always invoked via a TyCtxt; that structure can track the dependencies between queries by seeing which queries are invoked while another is executing. Also, unlike some build systems, the compiler tracks the order in which dependencies are invoked. This is important because the results of one query can change which other queries get executed — therefore, the order impacts the calculation of which queries need to be rerun when doing incremental compilation. For example, this query depends on different things depending on the result of subquery1():

    fn example_query<'tcx>(tcx: TyCtxt<'tcx>) -> SomeType {
        if tcx.subquery1() {
            tcx.subquery2()
        } else {
            tcx.subquery3()
        }
    }

During the initial run of the compiler, when there are no saved results, the program collects the graph of which queries refer to each other. When the compiler is rerun, it checks that graph to see which queries need to be rerun because of changed inputs. As an additional optimization, if the inputs to a query have changed, but the result of the query has not, the compiler can detect that and cut off the process of recalculating queries early.

The process of figuring out whether an intermediate result has changed is complicated by the fact that any information about the incremental build needs to be shared between runs of the compiler by saving it to disk. The compiler uses a lot of internal ID numbers to organize different components that differ from run to run, and serializing every intermediate result in the compiler would be expensive. So actually saving the results of cached queries is more complex.

In order to work around the problem of changing internal IDs, structures that are saved to disk are given "stable" forms of each ID. For example, function and type definitions are identified by path (e.g. std::collections::HashMap) instead of by internal ID. In order to avoid the overhead of deserializing intermediate results to detect changes, the compiler calculates and stores hashes for each item. Hashes are much more compact, and don't require complex deserialization, so this is a substantial performance improvement. The downside is that computing so many hashes actually causes significant overhead. The compiler's internal documentation calls it "the main reason why incremental compilation can be slower than non-incremental compilation."

These performance improvements introduce their own complications, of course. For example, if a result is never loaded from disk, the compiler won't necessarily be able to include it in the incremental-compilation database for the next run of the program. So, after generating results, the compiler does another pass through the query graph to fetch and update the results that it skipped earlier, in order to make sure that the size of the cache doesn't unnecessarily shrink. This increases the compiler's overall runtime — but what users usually care about is the time it takes to produce warnings and errors, not how long it takes the compiler to do cleanup before exiting. There are various other optimizations and tweaks applied to the largest queries to make the overall system more performant. The Rust Compiler Development Guide's section on the query system has more details.

Overall, the query-based approach introduces a lot of complexity compared to a simple pipeline-based approach. But any large, performance-sensitive codebase accumulates complexity over time — and having a single, well-understood system for incremental compilation may well win out over gradually introducing ad-hoc improvements to a pipeline-based system. In any case, Rust's incremental compilation has been stable enough to become the default for development builds. Release builds still avoid incremental compilation by default, however, out of an abundance of caution. Whether the query-based compiler design will ultimately be adopted by other languages remains to be seen.

Clever system !

Posted Dec 3, 2024 20:52 UTC (Tue) by matp75 (subscriber, #45699) [Link] (9 responses)

This is impressive and a bit crazy to do and achieve this with a compiler ! Completely agree on the fact that the build on change totally make sense to optimize out.

Clever system !

Posted Dec 4, 2024 7:58 UTC (Wed) by marcH (subscriber, #57642) [Link] (8 responses)

It looks crazy at first, then it just looks unusual and finally it all makes sense:

> This approach is reminiscent of a build system — and that's deliberate. Build systems, while not always perfect, usually do a good job of tracking the dependencies between components and correctly caching partial results. A compiler that has the architecture of a build system can take advantage of the research that has been done on build systems to do the same.

This is how most optimizations are achieved: don't repeatedly recompute/refetch what you already did in the past; cache it instead. The actually "crazy inefficient" thing has been to use for decades a pipeline approach for compilation instead of a graph of dependencies.

I bet compiling files one by one was already a (painful) "optimization" when C was invented: maybe because computers were simply too limited at the time to compile the whole program at once?

Clever system !

Posted Dec 4, 2024 8:52 UTC (Wed) by Wol (subscriber, #4433) [Link] (7 responses)

> I bet compiling files one by one was already a (painful) "optimization" when C was invented: maybe because computers were simply too limited at the time to compile the whole program at once?

Okay, I'm talking ten years later, but we made extensive use of libraries. Compute time was expensive and pre-compiled libraries saved a lot of that. Plus, as various people keep going on about modern optimisations and how the "simple model" of how a CPU works no longer fits reality, back then your typical C or Fortran or whatever code really was pretty much "portable assembler" so a decent programmer could optimise the hell out of code at a source level.

So compiling individual files to object code and stashing them in a library just made obvious sense. And even back then (as virtual memory was becoming the norm) you would have a "text" and "data" portion so the library code would only exist as one copy, even when many people were using it. In Pr1mos each subroutine/function call would create a data frame on the user's stack (or allocate space for a data-frame in the executable at link time - this obviously couldn't recurse ...).

Our company computer back then, for example, only had 256KB of RAM (Or was it KW, 512KB. I never really knew). Compare that to your typical PC of the era, which had 4KB and maxed out at 64KB. My Jupiter Ace had a massive 19KB.

Cheers,
Wol

Clever system !

Posted Dec 4, 2024 10:43 UTC (Wed) by Karellen (subscriber, #67644) [Link] (6 responses)

As someone who's not yet got into Rust (any day now, promise!) what is the difference between a crate and a library?

Clever system !

Posted Dec 4, 2024 11:03 UTC (Wed) by ebee_matteo (subscriber, #165284) [Link] (4 responses)

There are some things to keep in mind (I am boiling down a bit the following to keep it simple, sorry if it will be imprecise to some):

* Rust doesn't have a stable ABI.

* in Rust, everything is typically statically linked together; a manifest can specify different features which can cause dependent libraries to be recompiled when feature selection changes. In this sense, a library crate (a crate can also be binary-only, of course) in this sense is not too far away from a static library, but since there is no ABI stability guarantee for Rust, everything needs to be recompiled each time the compiler / toolchain changes.

* A lot of types use generic parameters. These are monomorphized at compile time. A library crate cannot possibly know all the ways a parameterized type is instantiated by its users. Type information needs hence to be encoded in the library e.g. by emitting an intermediate representation for it from the AST, or the source code for crates need to be available at compilation time for all its users.

* it gets worse with stuff such as proc macros :-)

So, a library crate is conceptually a library, but not a static or dynamic library in the C/C++ sense, which I think was the comment from OP asking for.

Incidentally, C++20 modules introduce a lot of the same issues also to that language :-).

Clever system !

Posted Dec 4, 2024 12:28 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (2 responses)

> Incidentally, C++20 modules introduce a lot of the same issues also to that language :-).

Hmm. I'm not seeing the connections to your bullet points. C++20 modules certainly support distributing ABI-stable libraries (and roughly on the same order of magnitude of difficulty as pre-modules) better than Rust does. Care to elaborate?

Clever system !

Posted Dec 4, 2024 12:48 UTC (Wed) by ebee_matteo (subscriber, #165284) [Link] (1 responses)

For C++ templates, if you export a template from a module without providing the source code, it would also need to end up into a binary representation for it to being monomorphized by any user of that library.

There is an experimental draft of a cross-compiler ABI for this, but to my knowledge this is not part of any C++ standard but just some kind of gentlemen agreement among compiler writers. People are still working on this and nowhere near complete. ABI stability is not even guaranteed across different versions of the same compiler (same as Rust, actually).

C++ does NOT define a stable ABI, it just kinda happened to settle down after decades of use. And for modules, which are a new feature, there is nothing uniform.

Clever system !

Posted Dec 4, 2024 16:30 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> For C++ templates, if you export a template from a module without providing the source code, it would also need to end up into a binary representation for it to being monomorphized by any user of that library.

No one has written anything about what this looks like in the Itanium ABI (or any other for that matter). AFAIK, modules still need to ship the module sources (just like headers before). Basically, PIMPL is still relevant and no, modules don't change the decl/impl file split. I don't see any appetite for no-module-interface library deployment from implementations, but maybe I'm just not listening in the right places.

> There is an experimental draft of a cross-compiler ABI for this

Are you talking about Microsoft's IFC format? That still doesn't resolve the "need to ship module source files" problem.

> C++ does NOT define a stable ABI

Nor will the language (in any likelihood). What is stable is the implementations' guarantees about how they generate ABIs when compiling. That has been sufficient so far.

(FD, should have mentioned in the prior comment, but I was rushing out the door: I implemented C++20 module support in CMake, co-chair WG21's SG15 Tooling, and am very involved in "how to build modules" processes as I hear about them.)

Clever system !

Posted Dec 6, 2024 6:23 UTC (Fri) by donald.buczek (subscriber, #112892) [Link]

> * Rust doesn't have a stable ABI.

It could be added, though, that Rust can both offer and use the C API and ABI. So it's not that Rust is less capable than C or C++, just that you wouldn't normally restrict Rust libraries for Rust users in that way.

Clever system !

Posted Dec 5, 2024 7:31 UTC (Thu) by samlh (subscriber, #56788) [Link]

[...]what is the difference between a crate and a library?

To keep it short:

A crate is the rust unit of compilation, composed of 1 or more rust source files
Crates can be compiled as libraries or executables
Library crates get compiled to a .rlib file which contains compiled binary code along with metadata for:
- exported apis, static variables, constants, etc
- generic function implementations (in a partially compiled form)
- other stuff :) [for example, when LTO is involved, the rlib can contain LLVM bitcode]
Think of .rlibs as equivalent to a C++ static library + header files + linking instructions.
The .rlib format (and the Rust abi more generally) is not stable across compiler releases, so .rlib files are usually treated as temporary artifacts (similar to .o files).
When compiling the final executable, the binary code from the rlibs are statically linked together, along with instantiations of the imported generic functions (as needed).
It is also possible to compile crates as C-style static libraries (libfoo.a) or shared objects (libfoo.so). This is often done in order to expose a stable C-compatible abi to be used externally.

See Rust by example and the Rust book for more info.

Release vs development builds

Posted Dec 5, 2024 12:46 UTC (Thu) by bjackman (subscriber, #109548) [Link] (2 responses)

> incremental compilation has been stable enough to become the default for development builds. Release builds still avoid incremental compilation by default

Does this mean "when rustc is doing a development build" or "in development builds of rustc"?

I found https://doc.rust-lang.org/cargo/reference/profiles.html which says how to enable it but I wonder if my dev builds already do so (default doesn't seem to be documented there).

Release vs development builds

Posted Dec 5, 2024 12:53 UTC (Thu) by excors (subscriber, #95769) [Link]

> (default doesn't seem to be documented there)

It looks like that's documented in the "Default profiles" section: the `dev` profile (default for `cargo build` etc) has `incremental = true`, and `release` has `incremental = false`.

Release vs development builds

Posted Dec 5, 2024 12:54 UTC (Thu) by intelfx (subscriber, #130118) [Link]

> Does this mean "when rustc is doing a development build" or "in development builds of rustc"?

The former.

The default settings are cited here: https://doc.rust-lang.org/cargo/reference/profiles.html#dev

Cross-crate dead code elimination

Posted Dec 5, 2024 18:54 UTC (Thu) by intgr (subscriber, #39733) [Link] (2 responses)

> When they run the compiler, it will evaluate a top-level query to produce an output binary. That query will in turn query for the final versions of each function in the crate (after type-checking and optimization).

> In Rust, an entire crate is a single translation unit

I figure this means that dependency crates are still built in their entirety?

If my top-level crate includes a large dependency crate, but I only use 1 function out of that dependency, ideally this query approach would be able to skip compiling everything in the dependency except for that function. And maybe some transitive dependency can be skipped entirely.

But that would require breaking down the crate boundary and making it transparent to the query system.

I guess fast incremental compilation is the initial goal of this work.
But hopefully it's on the roadmap to enable such cross-crate dead code elimination ahead of compilation.

Cross-crate dead code elimination

Posted Dec 6, 2024 11:24 UTC (Fri) by farnz (subscriber, #17727) [Link]

There's already parallelism between crates that could be extended to support this; the compiler prioritises finishing the .rmeta component of the build process and signals when the .rmeta file exists so that you can start building downstream crates before codegen takes place (but you can't start linking the final output until all the .rlib files are built).

At least in theory, since the .rmeta file contains everything that's needed to build downstream crates bar the actual code you link, you could extend the query system all the way to the linker, so that a "build" just produces the .rmeta files, and the link phase uses queries to get everything it needs to create the final output.

Cross-crate dead code elimination

Posted Dec 14, 2024 2:32 UTC (Sat) by saethlin (guest, #175049) [Link]

In general, we try rather hard to make sure that compile errors are surfaced, even in dead code. And dead code detection is really hard on generic code. So we pretty much need to run all the code through the compiler frontend (type-checking and borrow-checking) regardless of whether the code is reachable.

But apart from that, you can do a crude version of this right now by setting the flags -Zmir-opt-level=0 and -Zcross-crate-inline-threshold=always in your dependencies. Cargo profile overrides can do this unstably, for example:

[profile.release.package.aws_sdk_ec2]
rustflags = ["-Zmir-opt-level=0", "-Zcross-crate-inline-threshold=always"]

This basically delays all codegen to dependencies of the named crate, and then you get dead code elimination through the mechanism that rustc already uses to only lower required functions through the codegen backend. But those flags also may or may not ruin the incrementality of your builds, and may also make the entire compilation single-threaded, because the whole program is a single codegen unit based on your main.

I've worked a fair bit on this part of the compiler recently.