Skip to content

.fingerprint support (build.rs challenges) #3

@qknight

Description

@qknight

Problem

In order to support cargo iterative builds with the nix-backend, similar to the .fingerprint concept in the cargo-legacy backend, we have to solve a few hard problems. A project, your code you write and update, typically consists of one or several root crate(s) and they always use crates from crates.io which are added using cargo add serde.

A more complex rust project is the cargo implementation itself and is taken as an example here to illustrate the challenges.

The main nix problem boils down to: "how to generate a minimal list of input files for the sandboxed builds". This has to be solved for root crates and for build.rs crates. The other problem is how to support incremental rustc builds inside a nix builder sandbox without breaking the purity.

In short the challenges are:

  1. root unit(s) like lib/bin/...
  2. build.rs build_script_run phases
  3. incremental builds

recap: how is the source code passed into the nix builders (mkDerivation(s))

There are two concepts.

passing in most (project files) to src

This is the approach which works out of the box but triggers a full rebuild on all dependencies for actions like adding a space to the README.md (in the nix-backend).

src = builtins.filterSource
  (path: type:
    let base = baseNameOf path;
    in !(base == "target" || base == "result" || builtins.match "result-*" base != null)
) /home/nixos/tests/starship;

passing in a minimal subset (of project files) as src

This is ideal since nix then can track all the file changes of a subset of files and triggers rebuilds when needed. This is how .fingerprint is implemented in nix.

src = pkgs.lib.fileset.toSource {
  root = /home/nixos/cargo;
  fileset = pkgs.lib.fileset.unions [
    /home/nixos/cargo/Cargo.toml
    /home/nixos/cargo/Cargo.lock
  ];
};

Computing this list is a challenge.

diagram for cargo

Image

in depth

1. normal unit(s) (root crates)

main problem at the moment: we don't have a minimal list of files, i.e. dep-info, so we copy the project_dir in which triggers a rebuild for the given root crate with simple changes as adding a space in README.md

in theory we could implement this:

  1. we generate a runner which only runs rustc --emit=dep-info src/main.rs to generate a minimal file list and run this outside a nix sandbox
  2. we feed this list to the actual sandboxed builder (which we already have)
  3. we monitor the input files from (1.) for changes and start with running (1.) when we see changes

running (1.) is very slow and takes 6 seconds for the cargo (lib) crate alone but it is still better than compiling everything.

hope: in theory the rustc --emit=dep-info could be made iterative and there is an effort in the core compiler for other parts and once this is implemented for the dep-info also, it would be much, much faster. i considered to implement this myself using syn or salsa (rust-analyzer). But grok mentioned that parts of salsa are inconsistent with rustc implementation and that improvements in rustc iterative compiling might make it obsolete.

2. build.rs build_script_run phase (for root crates)

this feature requires two upstream tickets to work:

in theory we could implement this:

we never can create a minimal file list as we can for the normal unit(s) (root crates) case because there is no dep-info but instead the build_script_run generates output like:

cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
cargo:rustc-link-search=native=/nix/store/byx7ahs386pskh8d5sdkrkpscfz9yyjp-openssl-3.4.1/lib
cargo:rustc-link-lib=ssl
cargo:rustc-link-lib=crypto
cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR
cargo:rerun-if-changed=build/expando.c
cargo:rerun-if-env-changed=CC_x86_64-unknown-linux-gnu
cargo:rerun-if-env-changed=CC_x86_64_unknown_linux_gnu
cargo:rerun-if-env-changed=HOST_CC
cargo:rerun-if-env-changed=CC

instead we can do this:

  1. we have to supply all project_files to the nix build sandbox (for the build_script_run stage)
  2. we make this a Content-addressed (CA) derivation so when nothing changes no followup builds are triggered
  3. we run this each CARGO_BACKEND=nix cargo build anew ignoring the output it does

we should make sure that build_script_run writes not to OUT_DIR but instead into BUILD_OUT_DIR and that in the following stage it reads from BUILD_OUT_DIR. for normal cargo we can internally just set BUILD_OUT_DIR = OUT_DIR but for the nix stages this would be a huge relief because each stage has a different OUT_DIR so scripts wouldn't keep breaking reading from an empty OUT_DIR expecting a file generated in the previous stage. for this very scenario i added an example for coreutils later in this issue, just scroll down.

this is conceptually already implemented for the multiple-build-scripts.

build.rs runs are used for a few different things:

multiple-build-scripts

Even when supporting multiple-build-scripts: https://doc.rust-lang.org/cargo/reference/unstable.html#multiple-build-scripts

Content-addressed (CA) derivations

Using Content-addressed (CA) derivations in nix could be a way to prevent rebuilds delegating onwards:

http://build.rs

This means, since we can't make a minimal-file list for the input of the derivation, we have to pass in everything and even a change to an unrelated file triggers rebuilds.

discussions on build.rs static file lists in Cargo.toml are at:

impure builds

Some build.rs scripts might require to access the host filesystem, for these we might need:

NixOS/nix#6227

3. incremental builds

This feature is 'very' experimental until the issue below is implemented:

#7

Incremental builds is a rustc feature where the compile of a single unit is done. A single unit, cargo (lib) for example, which consists out of 189 files. Each respective compile is using the rustc compiler's internal graph, so they can be computed recursively. This is analogues to how the nix-backend inside cargo handles dependency crates like serde now. rustc computes a graph, computes hashes for the node and checks these for substitutes. If no substitutes were found rustc compiles them instead.

https://blog.rust-lang.org/2016/09/08/incremental/

Setup for incremental builds

We have a global cache, traditionally cargo has a per project cache which is much better for keeping clean with cargo clean.

  1. create directory layout

    mkdir -p /cargo-incremental-target
    chown root:nixbld /cargo-incremental-target

  2. in the mkDerivation don't write to $out, but into /tmp/out and then copy stuff to $out

rustc assumes these things are constant between builds (in order to reuse the incremental artifacts

  • cargo / rustc binary
  • same out dir (which means nix users have to copy the output another time)
  • same dependencies (inputs)
  • same environment variables
  1. next you can use it

This is the nix command:

 time nix build --file nix/cargo_build_caller.nix target -L --option extra-sandbox-paths '/incremental-target=/cargo-incremental-target'

And this is the adaption inside the crate (automatically created, you don't have to do anything):

      ${RUSTC} -v \
              -Z incremental-info \
              --crate-name cargo \
              --edition=2021 src/cargo/lib.rs \
              --crate-type lib \
              --emit=metadata,link \
              -C embed-bitcode=no \
              -C debuginfo=2 \
+             -C incremental=/incremental-target \
              -C split-debuginfo=unpacked \
              --allow=clippy::all \
              --warn=clippy::correctness \
              --warn=clippy::self_named_module_files \

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions