-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
In order to support cargo iterative builds with the nix-backend, similar to the .fingerprint concept in the cargo-legacy backend, we have to solve a few hard problems. A project, your code you write and update, typically consists of one or several root crate(s) and they always use crates from crates.io which are added using cargo add serde.
A more complex rust project is the cargo implementation itself and is taken as an example here to illustrate the challenges.
The main nix problem boils down to: "how to generate a minimal list of input files for the sandboxed builds". This has to be solved for root crates and for build.rs crates. The other problem is how to support incremental rustc builds inside a nix builder sandbox without breaking the purity.
In short the challenges are:
- root unit(s) like lib/bin/...
- build.rs build_script_run phases
- incremental builds
recap: how is the source code passed into the nix builders (mkDerivation(s))
There are two concepts.
passing in most (project files) to src
This is the approach which works out of the box but triggers a full rebuild on all dependencies for actions like adding a space to the README.md (in the nix-backend).
src = builtins.filterSource
(path: type:
let base = baseNameOf path;
in !(base == "target" || base == "result" || builtins.match "result-*" base != null)
) /home/nixos/tests/starship;passing in a minimal subset (of project files) as src
This is ideal since nix then can track all the file changes of a subset of files and triggers rebuilds when needed. This is how .fingerprint is implemented in nix.
src = pkgs.lib.fileset.toSource {
root = /home/nixos/cargo;
fileset = pkgs.lib.fileset.unions [
/home/nixos/cargo/Cargo.toml
/home/nixos/cargo/Cargo.lock
];
};Computing this list is a challenge.
diagram for cargo
in depth
1. normal unit(s) (root crates)
main problem at the moment: we don't have a minimal list of files, i.e. dep-info, so we copy the project_dir in which triggers a rebuild for the given root crate with simple changes as adding a space in README.md
in theory we could implement this:
- we generate a runner which only runs
rustc --emit=dep-info src/main.rsto generate a minimal file list and run this outside a nix sandbox - we feed this list to the actual sandboxed builder (which we already have)
- we monitor the input files from (1.) for changes and start with running (1.) when we see changes
running (1.) is very slow and takes 6 seconds for the cargo (lib) crate alone but it is still better than compiling everything.
hope: in theory the rustc --emit=dep-info could be made iterative and there is an effort in the core compiler for other parts and once this is implemented for the dep-info also, it would be much, much faster. i considered to implement this myself using syn or salsa (rust-analyzer). But grok mentioned that parts of salsa are inconsistent with rustc implementation and that improvements in rustc iterative compiling might make it obsolete.
2. build.rs build_script_run phase (for root crates)
this feature requires two upstream tickets to work:
- BUG: build.rs static Cargo.toml file declarations #6
- BUG: make build_script_run write to BUILD_OUT_DIR instead of OUT_DIR #5
in theory we could implement this:
we never can create a minimal file list as we can for the normal unit(s) (root crates) case because there is no dep-info but instead the build_script_run generates output like:
cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
cargo:rustc-link-search=native=/nix/store/byx7ahs386pskh8d5sdkrkpscfz9yyjp-openssl-3.4.1/lib
cargo:rustc-link-lib=ssl
cargo:rustc-link-lib=crypto
cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR
cargo:rerun-if-changed=build/expando.c
cargo:rerun-if-env-changed=CC_x86_64-unknown-linux-gnu
cargo:rerun-if-env-changed=CC_x86_64_unknown_linux_gnu
cargo:rerun-if-env-changed=HOST_CC
cargo:rerun-if-env-changed=CC
instead we can do this:
- we have to supply all project_files to the nix build sandbox (for the build_script_run stage)
- we make this a Content-addressed (CA) derivation so when nothing changes no followup builds are triggered
- we run this each
CARGO_BACKEND=nix cargo buildanew ignoring the output it does
we should make sure that build_script_run writes not to OUT_DIR but instead into BUILD_OUT_DIR and that in the following stage it reads from BUILD_OUT_DIR. for normal cargo we can internally just set BUILD_OUT_DIR = OUT_DIR but for the nix stages this would be a huge relief because each stage has a different OUT_DIR so scripts wouldn't keep breaking reading from an empty OUT_DIR expecting a file generated in the previous stage. for this very scenario i added an example for coreutils later in this issue, just scroll down.
this is conceptually already implemented for the multiple-build-scripts.
build.rs runs are used for a few different things:
- generate code from files in project_root or from the host filesystem, so we get a generate.rs for example
- -sys crates: generate DEP_xxx variables by searching the host system using
pkg-configfor openssl and similar - other things, see Reduce the need for users to write build scripts rust-lang/cargo#14948
multiple-build-scripts
Even when supporting multiple-build-scripts: https://doc.rust-lang.org/cargo/reference/unstable.html#multiple-build-scripts
- https://users.rust-lang.org/t/build-script-with-multiple-files/29614/6
- https://github.com/rodrimati1992/type_level/tree/master/type_level_values
Content-addressed (CA) derivations
Using Content-addressed (CA) derivations in nix could be a way to prevent rebuilds delegating onwards:
This means, since we can't make a minimal-file list for the input of the derivation, we have to pass in everything and even a change to an unrelated file triggers rebuilds.
discussions on build.rs static file lists in Cargo.toml are at:
- https://internals.rust-lang.org/t/cargo-fingerprinting/23834
- adding feature
rerun_if_changedto Cargo.toml directly rust-lang/cargo#6689 - Reduce the need for users to write build scripts rust-lang/cargo#14948
impure builds
Some build.rs scripts might require to access the host filesystem, for these we might need:
3. incremental builds
This feature is 'very' experimental until the issue below is implemented:
Incremental builds is a rustc feature where the compile of a single unit is done. A single unit, cargo (lib) for example, which consists out of 189 files. Each respective compile is using the rustc compiler's internal graph, so they can be computed recursively. This is analogues to how the nix-backend inside cargo handles dependency crates like serde now. rustc computes a graph, computes hashes for the node and checks these for substitutes. If no substitutes were found rustc compiles them instead.
https://blog.rust-lang.org/2016/09/08/incremental/
Setup for incremental builds
We have a global cache, traditionally cargo has a per project cache which is much better for keeping clean with cargo clean.
-
create directory layout
mkdir -p /cargo-incremental-target
chown root:nixbld /cargo-incremental-target -
in the mkDerivation don't write to $out, but into /tmp/out and then copy stuff to $out
rustc assumes these things are constant between builds (in order to reuse the incremental artifacts
- cargo / rustc binary
- same out dir (which means nix users have to copy the output another time)
- same dependencies (inputs)
- same environment variables
- next you can use it
This is the nix command:
time nix build --file nix/cargo_build_caller.nix target -L --option extra-sandbox-paths '/incremental-target=/cargo-incremental-target'
And this is the adaption inside the crate (automatically created, you don't have to do anything):
${RUSTC} -v \
-Z incremental-info \
--crate-name cargo \
--edition=2021 src/cargo/lib.rs \
--crate-type lib \
--emit=metadata,link \
-C embed-bitcode=no \
-C debuginfo=2 \
+ -C incremental=/incremental-target \
-C split-debuginfo=unpacked \
--allow=clippy::all \
--warn=clippy::correctness \
--warn=clippy::self_named_module_files \