Skip to content

Comments

haskell.compiler.ghc902Binary: bump LLVM by wrapping opt(1)#440271

Merged
wolfgangwalther merged 2 commits intoNixOS:masterfrom
emilazy:push-onwtssmnoqpo
Sep 7, 2025
Merged

haskell.compiler.ghc902Binary: bump LLVM by wrapping opt(1)#440271
wolfgangwalther merged 2 commits intoNixOS:masterfrom
emilazy:push-onwtssmnoqpo

Conversation

@emilazy
Copy link
Member

@emilazy emilazy commented Sep 4, 2025

Implement a wrapper script to translate the opt(1) arguments passed by the GHC 9.0.2 binary distribution to the equivalent arguments for the new LLVM pass manager passed by GHC ≥ 9.10 and our soon‐to‐be‐patched compilers. This ensures that the bootstrap of GHC 9.4 continues to work on AArch64.

On an earlier version of this change, I built haskell.compiler.ghc948 on both aarch64-linux and aarch64-darwin, and haskell.compiler.ghc924 on aarch64-linux only (it is already broken on Darwin). I confirmed that we get functionally identical store outputs before and after this change, modulo self‐references:

$ cp -a result-before/ before
$ cp -a result-after/ after
$ chmod -R +w before after
$ LANG=C find before -type f -exec \
    remove-references-to \
    -t $(readlink result-before) \
    -t $(readlink result-before-doc) \
    '{}' ';'
$ LANG=C find after -type f -exec \
    remove-references-to \
    -t $(readlink result-after) \
    -t $(readlink result-after-doc) \
    '{}' ';'
# Darwin only: normalize build user UIDs in the archive files…
$ LANG=C find before -name '*.a' -exec \
    sed -i 's/ 360 / 351 /g' '{}' ';'
$ diff -r before after
# Linux only: the `package.cache` files differ, presumably due to
# an unrelated reproducibility issue.

Therefore, bumping this LLVM dependency did not affect the end result of the bootstrap for the only compilers it is used for.

Things done

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

@emilazy emilazy requested review from a team September 4, 2025 23:15
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 501+ This PR causes many rebuilds on Linux and should normally target the staging branches. 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-darwin: 5001+ This PR causes many rebuilds on Darwin and must target the staging branches. 10.rebuild-linux: 5001+ This PR causes many rebuilds on Linux and must target the staging branches. 6.topic: haskell General-purpose, statically typed, purely functional programming language labels Sep 4, 2025
@wolfgangwalther
Copy link
Contributor

I think we should target haskell-updates with this, even though that means it might take a tad longer. That's because otherwise we will have to resolve the conflicts with #439323, #422342 and #381265 much later, when merging between haskell-updates and staging etc. That's assuming that the current staging cycle will still take a while due to x86_64-darwin rebuilds - and then we're off to another round of staging-25.05 - so before this would get to us via staging, it would still take a few weeks.

@emilazy
Copy link
Member Author

emilazy commented Sep 5, 2025

My inclination is actually the reverse: why not send those compiler drops to master? I certainly support dropping unneeded GHCs – IIRC @sternenseemann said 8.10, the 9.0 source build, and 9.2 could be dropped for 25.11 when we talked about it in the Staging room – and perhaps I could have gotten away with building fewer of them to validate this PR! But they don’t cause a meaningful number of rebuilds, so there’s no need to batch them with Haskell package updates. They could land directly in master with no issues, and get cherry‐picked or merged back into haskell-updates if necessary with no issues too.

I know that haskell-updates often progresses quite slowly, and these GHC changes are one of only three remaining PRs that block removing six versions of LLVM. I’d expect the 25.11 freeze process to start in about a month, and there are other LLVM changes pending that would be made easier by the drops landing, e.g. #436350. So I am anxious to not block the LLVM removals on a somewhat indefinite process, especially as I have other breaking changes I would like to switch gears to working on before the freeze. Waiting for a staging cycle before merging the LLVM drops (or else merging them into staging and dealing with periodic merge fun) already gives me a bit of timing anxiety here. It seems like there is some disagreement on both #439323 and #422342, too, so I would rather not block on those, although I understand the desire to avoid postponing resolving the merge conflicts.

It’s true that this PR will rebuild all the Haskell packages, due to patching the default GHC version, but the rebuilds should not be risky: the patches here only change the LLVM backend, which is not used for any Hydra platform for GHC ≥ 9.2; the main thing it is used for is to bootstrap 9.4 on AArch64, which I have shown results in essentially bit‐identical executable code. As long as the compilers bootstrap and function to a basic level, they shouldn’t regress anything that Hydra builds could catch in the first place, and that can be verified outside of haskell-updates.

Do you have an ETA for when the current haskell-updates will merge into staging? Is the major GHC bump planned for this cycle or the next? There is another option that may be more palatable: we could mark the LLVM backend broken across the board on master, and then send the changes to fix it back up again to staging or haskell-updates. The main temporary casualty would be 9.4 on AArch64, so Haskell would be broken in pkgsStatic on AArch64, and apparently Hedgewars would be too. Or the middle‐ground option: land the opt(1) wrapper on master to unlock bootstrap of 9.4 via the 9.0.2 binary, but mark everything else as broken under LLVM while waiting for the merge, which will ensure that the only thing that regresses on master is cross‐compiling/cross‐compiled GHCs not built by Hydra in the first place, and which should keep the rebuilds minimal. If that sounds sensible to you then I’d be very happy to proceed with that path, as it would take staging out of the LLVM‐dropping equation entirely. But I wanted to propose the option that breaks the fewest things (even temporarily) first before getting clever about the rebuilds.

@wolfgangwalther
Copy link
Contributor

My inclination is actually the reverse: why not send those compiler drops to master?

Not all of them would be able to go to master, but not opposed in general. I already merged a bunch of uncontroversial drops to haskell-updates, but it should not be a problem to cherry-pick these back to master, too, I think.

For example #440410 affects the bootstrap for darwin, so that must certainly go via haskell-updates (or staging, for that matter).

It’s true that this PR will rebuild all the Haskell packages, due to patching the default GHC version, but the rebuilds should not be risky: the patches here only change the LLVM backend, which is not used for any Hydra platform for GHC ≥ 9.2; the main thing it is used for is to bootstrap 9.4 on AArch64, which I have shown results in essentially bit‐identical executable code. As long as the compilers bootstrap and function to a basic level, they shouldn’t regress anything that Hydra builds could catch in the first place, and that can be verified outside of haskell-updates.

Yeah, I'm not worried about the rebuild introducing regressions, that looks fine.

Do you have an ETA for when the current haskell-updates will merge into staging?

Certainly not soon enough for your schedule, so that won't work out :)

There is another option that may be more palatable: we could mark the LLVM backend broken across the board on master, and then send the changes to fix it back up again to staging or haskell-updates. The main temporary casualty would be 9.4 on AArch64, so Haskell would be broken in pkgsStatic on AArch64, and apparently Hedgewars would be too.

Personally, I'd not be too worried about that, because from my experience pkgsStatic on AArch64 only works well from GHC 9.6 on. IIRC, I would always hit some GHC bugs when trying that with GHC 9.4. So from my perspective, this is not supported well right now anyway.

Marking the LLVM backend broken temporarily would be an OK alternative for me.

Or the middle‐ground option: land the opt(1) wrapper on master to unlock bootstrap of 9.4 via the 9.0.2 binary, but mark everything else as broken under LLVM while waiting for the merge, which will ensure that the only thing that regresses on master is cross‐compiling/cross‐compiled GHCs not built by Hydra in the first place, and which should keep the rebuilds minimal. If that sounds sensible to you then I’d be very happy to proceed with that path, as it would take staging out of the LLVM‐dropping equation entirely. But I wanted to propose the option that breaks the fewest things (even temporarily) first before getting clever about the rebuilds.

I don't feel comfortable reviewing / approving this PR myself, I'm not deep enough into the whole business of building GHC, yet. Thus, I can't really judge this approach. I certainly want an approval from @sternenseemann before this one gets merged.

In general, I think it would make most sense to try to drop as many of the GHC versions first and then figure out how to fix the remaining ones instead of the other way around. Now.. that ship has sailed, because you already did the work :D

@emilazy
Copy link
Member Author

emilazy commented Sep 5, 2025

Not all of them would be able to go to master, but not opposed in general. I already merged a bunch of uncontroversial drops to haskell-updates, but it should not be a problem to cherry-pick these back to master, too, I think.

It seems like a good idea to me. I can rebase for such cherry‐picks, of course, regardless of the path chosen for these changes.

Personally, I'd not be too worried about that, because from my experience pkgsStatic on AArch64 only works well from GHC 9.6 on. IIRC, I would always hit some GHC bugs when trying that with GHC 9.4. So from my perspective, this is not supported well right now anyway.

Then perhaps we shouldn’t specifically default to 9.4 solely for pkgsStatic

Marking the LLVM backend broken temporarily would be an OK alternative for me.

In general, I think it would make most sense to try to drop as many of the GHC versions first and then figure out how to fix the remaining ones instead of the other way around. Now.. that ship has sailed, because you already did the work :D

OTOH, I did this work because I first raised dropping old GHCs last year and even with the new policy it has not seemed tractable to achieve sufficient velocity on that to not end up depending on otherwise‐unused LLVM versions indefinitely. These changes decouple all GHC versions from maintenance of LLVM < 19, without sacrifice in backend support, and so provide maximum freedom for Haskell maintainers to make decisions around version support without it having an effect on LLVM version support decisions (at least until we start looking at dropping LLVM 19, but it looked like the in‐flight changes to enable LLVM 20 aren’t very complex either).

From discussion with @sternenseemann, it sounds like 9.4 is not going to get dropped until the issues Hadrian has with cross‐compiling GHCs are resolved, so making the bootstrap work there and patching it for newer LLVMs (to be able to cross‐compile a GHC for new platforms in the first place) is unavoidable. I would personally be happy to just mark the LLVM backend broken for versions above 9.4 and below 9.12, but most of the work here was just in corralling the patches required to fix 9.4; keeping the backend working for the intermediate versions is pretty cheap.

Anyway, merging the opt(1) bootstrap thing separately and just marking the source builds as broken with LLVM would be a much smaller diff than this and not differ meaningfully based on what compilers are present or what the long‐term goal is for GHC LLVM support. Since it can be mechanically verified to successfully produce a bootstrapped GHC 9.4 with identical machine code on AArch64, there’s no question that it works perfectly for all the cases that matter to Hydra. I believe it would result in zero regressions in terms of Hydra jobs outside of GHC 8.10 and 9.0 themselves, and only a handful of rebuilds.

The commit to patch the source releases for newer LLVMs could then be sent to haskell-updates or staging and it doesn’t make a big difference to me how long that process takes. The only question there is whether it’s considered palatable to temporarily break GHC < 9.12 on RISC‐V, GHC ≥ 9.4 on SPARC, and all GHCs targeting more obscure architectures. I opened this to bump everything at once as a starting point, to leave that call to the Haskell maintainers. But I would personally prefer that route, as it avoids bringing staging into the critical chain of work that is already essentially complete.

@wolfgangwalther
Copy link
Contributor

Then perhaps we shouldn’t specifically default to 9.4 solely for pkgsStatic

Well, pkgsStatic for GHC 9.6 is even more broken in general, because Template Haskell will not work at all. So there is a good reason to stick to GHC 9.4 for that, right now.

@emilazy
Copy link
Member Author

emilazy commented Sep 5, 2025

Certainly not soon enough for your schedule, so that won't work out :)

FWIW, the only hard deadline from my POV is whenever the 25.11 freeze gets scheduled for. But it would be convenient to be able to base further work on top of the cleaned‐up LLVM, to avoid rebasing the drop PR repeatedly, to not have to worry about reverting PRs that add dependencies on LLVMs that the authors don’t realize are actually already dead, etc., so I have a strong preference for at least not waiting more than one more staging cycle for the drops, and preferably for getting them directly into master instead.

@emilazy
Copy link
Member Author

emilazy commented Sep 6, 2025

@sternenseemann Thoughts on this approach? It would reduce the blocking part of the review here to just 9.0.2-binary.nix and subopt.bash (well, plus the bump in haskell-packages.nix and the broken markers), which could be sent to master, and then I can send the patches to re‐enable LLVM support to haskell-updates or wherever else to be reviewed at whatever pace works for your availability.

Anyway, merging the opt(1) bootstrap thing separately and just marking the source builds as broken with LLVM would be a much smaller diff than this and not differ meaningfully based on what compilers are present or what the long‐term goal is for GHC LLVM support. Since it can be mechanically verified to successfully produce a bootstrapped GHC 9.4 with identical machine code on AArch64, there’s no question that it works perfectly for all the cases that matter to Hydra. I believe it would result in zero regressions in terms of Hydra jobs outside of GHC 8.10 and 9.0 themselves, and only a handful of rebuilds.

The commit to patch the source releases for newer LLVMs could then be sent to haskell-updates or staging and it doesn’t make a big difference to me how long that process takes. The only question there is whether it’s considered palatable to temporarily break GHC < 9.12 on RISC‐V, GHC ≥ 9.4 on SPARC, and all GHCs targeting more obscure architectures. I opened this to bump everything at once as a starting point, to leave that call to the Haskell maintainers. But I would personally prefer that route, as it avoids bringing staging into the critical chain of work that is already essentially complete.

@sternenseemann
Copy link
Member

sternenseemann commented Sep 6, 2025

I think we should merge this onto the fastest branch possible and then cherry pick the changes onto haskell-updates (or the other way around) so we get better visibility (though it is somewhat pointless since the problems are unlikely to become apparent on x86_64-linux) and avoid painful merge conflicts later.

I would prefer not to break master if we can avoid it. We can of course split out the bindist changes and target staging (if you're using that for the LLVM changes as well). If we take too long reviewing the other batch of changes, so be it.

@emilazy
Copy link
Member Author

emilazy commented Sep 6, 2025

I would prefer not to break master if we can avoid it.

To be clear, if we ignore 8.10 and 9.0 on AArch64 (which could be resolved by just merging their drops to master beforehand), then the opt(1) bootstrap change for the AArch64 9.0.2 binary → 9.4 path would regress zero Hydra jobs from the main release jobset – it’s only GHC cross to architectures lacking NCG that would be marked broken for the duration of one staging or haskell-updates cycle. Everything built by Hydra, including 9.4 on AArch64, pkgsStatic, etc., would continue to work.

Would you still prefer to go via staging, given that? I can split up the PR for sure, but It feels a little awkward to send the very‐low‐rebuilds bootstrap change to staging, especially when that is the one that would unblock the drops on master.

@sternenseemann
Copy link
Member

GHC 9.10 also failed; I don’t know what the cause is (armv7l-unknown-linux-gnueabihf-ghc: could not execute: clang is not very helpful), but it also occurs on master.

This smells like https://gitlab.haskell.org/ghc/ghc/-/commit/ab533e711e60849fe4cde489644b71df71d3ca47 which I only discovered while reviewing this PR.

@sternenseemann
Copy link
Member

I can split up the PR for sure, but It feels a little awkward to send the very‐low‐rebuilds bootstrap change to staging, especially when that is the one that would unblock the drops on master.

Right we need that to remove 8.10.7. Didn't think about that.

@emilazy
Copy link
Member Author

emilazy commented Sep 6, 2025

I meant the LLVM drops. But I don’t think the subopt change should affect dropping 8.10? It only affects the 9.0.2 bindist.

@emilazy
Copy link
Member Author

emilazy commented Sep 6, 2025

This smells like https://gitlab.haskell.org/ghc/ghc/-/commit/ab533e711e60849fe4cde489644b71df71d3ca47 which I only discovered while reviewing this PR.

Sounds right; so GHC ≥ 9.10 need clang adding to their runtime dependencies, but it’s not related to this PR at least.

@sternenseemann
Copy link
Member

Sounds right; so GHC ≥ 9.10 need clang adding to their runtime dependencies, but it’s not related to this PR at least.

#440761

@emilazy
Copy link
Member Author

emilazy commented Sep 7, 2025

Otherwise I’ll try to get to it tomorrow.

It’s tomorrow. Exactly as before on both Linux and Darwin for 9.4.8, so LLVM 20 works fine for this. I’d rather pursue something like #440794 than build 9.2, so from my POV this is ready to go.

@nixpkgs-ci nixpkgs-ci bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Sep 7, 2025
@emilazy emilazy changed the title haskell.compiler.ghc{8107{,Binary},902{,Binary},928,948,963,967,984,9101,9102,9121,9122}: bump LLVM haskell.compiler.ghc902Binary: bump LLVM by wrapping opt(1) Sep 7, 2025
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. and removed 2.status: merge conflict This PR has merge conflicts with the target branch 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. labels Sep 7, 2025
@wolfgangwalther
Copy link
Contributor

This needs a rebase.

These binary packages are available for a fixed set of platforms,
all of which support the native code generator. Therefore, the
`llvmPackages` argument was never used. We leave an assertion around,
just in case.
Implement a wrapper script to translate the `opt(1)` arguments passed
by the GHC 9.0.2 binary distribution to the equivalent arguments
for the new LLVM pass manager passed by GHC ≥ 9.10 and our
soon‐to‐be‐patched compilers. This ensures that the bootstrap
of GHC 9.4 continues to work on AArch64.

On an earlier version of this change, I built `haskell.compiler.ghc948`
on both `aarch64-linux` and `aarch64-darwin`, and
`haskell.compiler.ghc924` on `aarch64-linux` only (it is already
broken on Darwin). I confirmed that we get functionally identical
store outputs before and after this change, modulo self‐references:

    $ cp -a result-before/ before
    $ cp -a result-after/ after
    $ chmod -R +w before after
    $ LANG=C find before -type f -exec \
        remove-references-to \
        -t $(readlink result-before) \
        -t $(readlink result-before-doc) \
        '{}' ';'
    $ LANG=C find after -type f -exec \
        remove-references-to \
        -t $(readlink result-after) \
        -t $(readlink result-after-doc) \
        '{}' ';'
    # Darwin only: normalize build user UIDs in the archive files…
    $ LANG=C find before -name '*.a' -exec \
        sed -i 's/ 360 / 351 /g' '{}' ';'
    $ diff -r before after
    # Linux only: the `package.cache` files differ, presumably due to
    # an unrelated reproducibility issue.

Therefore, bumping this LLVM dependency did not affect the end result
of the bootstrap for the only compilers it is used for.
@emilazy
Copy link
Member Author

emilazy commented Sep 7, 2025

Done.

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. and removed 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Sep 7, 2025
@wolfgangwalther
Copy link
Contributor

Thank you!

@wolfgangwalther wolfgangwalther merged commit 50a00d8 into NixOS:master Sep 7, 2025
29 of 32 checks passed
@emilazy
Copy link
Member Author

emilazy commented Sep 7, 2025

Thanks for the quick reviews and collaboration on this stuff! :)

@emilazy emilazy deleted the push-onwtssmnoqpo branch September 7, 2025 18:39
@wegank
Copy link
Member

wegank commented Sep 7, 2025

Hello, I just got back to Paris from NixCon and I noticed that the check

enableManpages ? buildPackages.pandoc.compiler.bootstrapAvailable,

now throws an error on LoongArch (and therefore also on RISC-V) instead of evaluating to false. The error message suggests that the assertion

assert import ./common-have-ncg.nix { inherit lib stdenv version; };

failed. This is also reproducible in nix repl with the system set to loongarch64-linux. Could you please take a look to see how to fix this?

Comment on lines +214 to +215
assert import ./common-have-ncg.nix { inherit lib stdenv version; };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.. these asserts should not be placed at the top-level, otherwise they will prevent evaluating meta.available.

This assert could either be deferred until after meta (by putting in on some derivation argument instead) or the boolean returned by the import could be put in meta.broken instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken sounds like the way to go. Although frankly I can't imagine when we'd add other architectures for these that need LLVM without noticing that they don't work to do the bootstrap they're intended for, so I'm not sure how much value the check adds.

Can put up a PR in an hour or so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #441069.

@trofi
Copy link
Contributor

trofi commented Sep 9, 2025

Bisect says that 1bda5b1 haskell.compiler.ghc{924,963,984}Binary: remove LLVM‐related dead code caused eval failures in master for ghc.bootPkgs.GlomeVec:

$ nix-instantiate -A ghc.bootPkgs.GlomeVec
error:
       … while calling the 'derivationStrict' builtin
         at <nix/derivation-internal.nix>:37:12:
           36|
           37|   strict = derivationStrict drvAttrs;
             |            ^
           38|

       … while evaluating derivation 'GlomeVec-0.2'
         whose name attribute is located at pkgs/stdenv/generic/make-derivation.nix:539:13

       … while evaluating attribute 'buildInputs' of derivation 'GlomeVec-0.2'
         at pkgs/stdenv/generic/make-derivation.nix:591:13:
          590|             depsHostHost = elemAt (elemAt dependencies 1) 0;
          591|             buildInputs = elemAt (elemAt dependencies 1) 1;
             |             ^
          592|             depsTargetTarget = elemAt (elemAt dependencies 2) 0;

       (stack trace truncated; use '--show-trace' to show the full, detailed trace)

       error: expected a set but found null: null

@wolfgangwalther
Copy link
Contributor

Bisect says that 1bda5b1 haskell.compiler.ghc{924,963,984}Binary: remove LLVM‐related dead code caused eval failures in master for ghc.bootPkgs.GlomeVec:

#441101 seems to fix this.

sternenseemann added a commit to sternenseemann/nixpkgs that referenced this pull request Sep 11, 2025
If we don't explicitly pass something as LLVMAS, GHC >= 9.10 will try to
invoke "clang" at runtime which fails.

Upstream change: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12005

I've tested pkgsCross.armv7l-hf-multiplatform.buildPackages.haskell.compiler.ghc9102
which was reported to be failing here:
NixOS#440271 (comment)
@emilazy emilazy mentioned this pull request Sep 16, 2025
1 task
marcusramberg pushed a commit to marcusramberg/nixpkgs that referenced this pull request Sep 29, 2025
If we don't explicitly pass something as LLVMAS, GHC >= 9.10 will try to
invoke "clang" at runtime which fails.

Upstream change: https://gitlab.haskell.org/ghc/ghc/-/merge_requests/12005

I've tested pkgsCross.armv7l-hf-multiplatform.buildPackages.haskell.compiler.ghc9102
which was reported to be failing here:
NixOS#440271 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: haskell General-purpose, statically typed, purely functional programming language 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants