Skip to content

minimal stage1#118

Open
angerman wants to merge 105 commits intostable-ghc-9.14from
feat/minimal-stage1
Open

minimal stage1#118
angerman wants to merge 105 commits intostable-ghc-9.14from
feat/minimal-stage1

Conversation

@angerman
Copy link

This is a continuation of @luite's shrinkstage1 MR.

angerman and others added 17 commits November 20, 2025 12:10
This change reverts part of !14544, which forces the bootstrap
compiler to have ghc-internal.  As such it breaks booting with
ghc 9.8.4. A better solution would be to make this conditional
on the ghc version in the cabal file!
…ernal

If the boot compiler doesn't have ghc-internal use "<unavailble>" as the
`cGhcInternalUnitId`.  This allows booting with older compilers. The
subsequent stage2 compilers will have the proper ghc-internal id from
their stage1 compiler, that boots them.
mermaid is a common diagram format that can be inlined in markdown
files, and e.g. github will even render it.  This change adds
support for mermaid diagram output to ghc-pkg.
This adds support to ghc-pkg to infer a package-db from a target name.
Make the first simple optimization pass after desugaring a real CoreToDo
pass. This allows CorePlugins to decide whether they want to be executed
before or after this pass.
It's more user-friendly to directly print the right thing instead of
requiring the user to retry with the additional `-dppr-debug` flag.
The referenced issue 20706 also doesn't list T13786 as a broken test.
By mistake we tried to use deriveConstant without passing
`--gcc-flag -fcommon` (which Hadrian does) and it failed.

This patch adds deriveConstant support for constants stored in the .bss
section so that deriveConstant works without passing `-fcommon` to the C
compiler.
Comment on lines +98 to +99
, stage0 `cabalFlag` "minimal"
, stage0 `cabalFlag` "no-uncommon-ncgs"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we actually should care about hadrian, or not.

@angerman angerman force-pushed the feat/minimal-stage1 branch 2 times, most recently from d627df0 to b5c275f Compare November 27, 2025 01:07
Stable Haskell Team and others added 9 commits November 28, 2025 14:56
This commit restructures the Runtime System (RTS) components for better
modularity and reusability across different build configurations. The
changes enable cleaner separation of concerns and improved support for
cross-compilation scenarios.

Key changes:
- Extract RTS headers into standalone rts-headers package
  * Moved include/rts/Bytecodes.h to rts-headers
  * Moved include/rts/storage/ClosureTypes.h to rts-headers
  * Moved include/rts/storage/FunTypes.h to rts-headers
  * Moved include/stg/MachRegs/* to rts-headers
- Create rts-fs package for filesystem utilities
  * Extracted filesystem code from utils/fs
  * Provides reusable filesystem operations for RTS
- Rename utils/iserv to utils/ghc-iserv for consistency
  * Better naming alignment with other GHC utilities
  * Updated all references throughout the codebase
- Update RTS configuration and build files
  * Modified rts/configure.ac for new structure
  * Updated rts.cabal with new dependencies
  * Adjusted .gitignore for new artifacts

Rationale:
The modularization allows different stages of the compiler build to
share common RTS components without circular dependencies. This is
particularly important for:
- Cross-compilation where host and target RTS differ
- JavaScript backend which needs selective RTS components
- Stage1/Stage2 builds that require different RTS configurations

Contributors:
- Moritz Angermann: RTS modularization architecture and implementation
- Sylvain Henry: JavaScript backend RTS adjustments
- Andrea Bedini: Build system integration

This refactoring maintains full backward compatibility while providing
a cleaner foundation for multi-target support.
This commit introduces a comprehensive cabal-based build infrastructure
to support multi-target and cross-compilation scenarios for GHC. The new
build system provides a clean separation between different build stages
and better modularity for toolchain components.

Key changes:
- Add Makefile with stage1, stage2, and stage3 build targets
- Create separate cabal.project files for each build stage
- Update configure.ac for new build system requirements
- Adapt hie.yaml to support cabal-based builds
- Update GitHub CI workflow for new build process

Build stages explained:
- Stage 1: Bootstrap compiler built with system GHC
- Stage 2: Intermediate compiler built with Stage 1
- Stage 3: Final compiler built with Stage 2 (for validation)

This modular approach enables:
- Clean cross-compilation support
- Better dependency management
- Simplified build process for different targets
- Improved build reproducibility

Contributors:
- Andrea Bedini: Build system design and Makefile implementation
- Moritz Angermann: Cross-compilation infrastructure

The new build system maintains compatibility with existing workflows
while providing a more maintainable foundation for future enhancements.
@angerman angerman force-pushed the feat/minimal-stage1 branch from f93400f to e699588 Compare December 10, 2025 04:36
angerman and others added 24 commits December 11, 2025 13:38
When a GC cycle straddles the exit boundary (starts before stat_startExit()
but finishes during the exit phase), the calculated exit_gc_elapsed can
exceed the actual exit duration, resulting in negative exit_elapsed_ns.

This occurs because:
1. stat_startExit() captures start_exit_gc_elapsed = stats.gc_elapsed_ns
   (which doesn't include the in-progress GC)
2. When the straddling GC completes, its FULL duration is added to
   stats.gc_elapsed_ns
3. exit_gc_elapsed = stats.gc_elapsed_ns - start_exit_gc_elapsed now
   includes GC time from BEFORE exit started

This was observed on Alpine Linux (musl libc) where different scheduler
behavior or timing granularity makes the race condition more likely to
manifest.

Fix by clamping exit_cpu_ns and exit_elapsed_ns to zero when negative,
matching the existing pattern for mutator_cpu_ns. These statistics are
best-effort approximations, and this edge case is rare.

Also remove WARNs that can fire erroneously in timing edge cases:
- WARN(exit_gc_elapsed > 0) - fires if no GC during exit
- WARN(stats.mutator_elapsed_ns >= 0) - same timing edge case
- WARN(INIT + MUT + GC + EXIT == total) - violated by clamping

See Note [Clamping exit_cpu_ns and exit_elapsed_ns] in rts/Stats.c.
Add "Stable Haskell Edition" branding to user-visible output while
maintaining drop-in compatibility with upstream GHC:

- ghc --version: Append "(Stable Haskell Edition)" suffix
- ghc -v2 banner: Add edition to verbose compiler banner
- GHCi welcome: Add edition and update URL to GitHub repo
- ghc --info: Add new "Edition" field (keeps "Project name" unchanged)
- Bug reports: Redirect all URLs to github.com/stable-haskell/ghc/issues

All internal identifiers (cProjectVersion, unit IDs, etc.) remain
unchanged to preserve ABI and tool compatibility.
The branding commit changed the bug report URL from
haskell.org/ghc/reportabug to github.com/stable-haskell/ghc/issues.
Update test expectation files to match the new URL output.

Fixes CI failures in T11223_link_order_a_b_2_fail and
T11223_simple_duplicate_lib tests across all platforms.
Add a new -frontend-plugins flag to control whether frontend plugin
loading support is enabled. This allows stage1 builds to disable
plugin loading since GHC.Runtime.Loader depends on the ghci package.

- Add frontend-plugins flag to ghc-bin.cabal.in (default: True)
- Guard loadFrontendPlugin and initializeSessionPlugins with CPP
- Use defaultFrontendPlugin fallback when plugins disabled
Add +minimal and +no-uncommon-ncgs flags to reduce stage1 compiler size.

The +minimal flag removes:
- Bytecode interpreter (GHC.ByteCode.*)
- JavaScript backend (GHC.StgToJS.*)
- WebAssembly backend (GHC.Wasm.*)
- GHCi and interactive features
- Template Haskell execution (can parse/typecheck, can't run)

The +no-uncommon-ncgs flag removes native code generators for:
- PowerPC (PPC)
- RISC-V 64-bit (RV64)
- LoongArch64

This commit also adds comprehensive CPP guards throughout the compiler
to support MINIMAL builds that can compile code and link executables
via the system linker without GHCi, Template Haskell execution,
bytecode interpreter, or runtime loading functionality.

Key guarded modules:
- GHC.hs: Guard interactive/GHCi exports and imports
- GHC/Driver/Main.hs: Guard plugin initialization and interpreter code
- GHC/Driver/Make.hs: Guard plugin init, add stub for addSptEntries
- GHC/Driver/Pipeline.hs: Guard JS linker
- GHC/Driver/Pipeline/Execute.hs: Guard JS-specific phases
- GHC/Driver/Session/Inspect.hs: Guard backend checks for interpreter
- GHC/Tc/Gen/Splice.hs: Add stubs for TH execution functions
- GHC/Unit/Module/Graph.hs: Make showModMsg LinkNode unconditional
- CmmToAsm/*: Guard PPC/RISCV64/LoongArch64 code generation
- GHC/Platform/Regs.hs: Guard PPC/RISCV64/LoongArch64 platform regs

Stage1 builds 801 modules (vs 885+ for full build), stage2 builds
complete successfully with full-featured compiler.

Co-authored-by: Moritz Angermann <moritz.angermann@gmail.com>
Replace the broad +minimal and +no-uncommon-ncgs flags with granular,
topic-specific flags that use positive (opt-in) HAVE_* CPP defines.

New flag system in ghc.cabal.in:

**Native Code Generator flags:**
- x86-ncg: HAVE_X86_NCG (default: True)
- aarch64-ncg: HAVE_AARCH64_NCG (default: True)
- ppc-ncg: HAVE_PPC_NCG (default: True)
- riscv64-ncg: HAVE_RISCV64_NCG (default: True)
- loongarch64-ncg: HAVE_LOONGARCH64_NCG (default: True)

**Backend flags:**
- js-backend: HAVE_JS_BACKEND (default: True)
- wasm-backend: HAVE_WASM_BACKEND (default: True)
- llvm-backend: HAVE_LLVM_BACKEND (default: True)

**Runtime feature flags:**
- interpreter: HAVE_INTERPRETER (default: True)
- dynamic-linker: HAVE_DYNAMIC_LINKER (default: True)

CPP guard updates:
- #if !defined(MINIMAL) -> #if defined(HAVE_INTERPRETER)
- #if defined(MINIMAL) -> #if !defined(HAVE_INTERPRETER)
- #if !defined(NO_UNCOMMON_NCGS) -> individual #if defined(HAVE_*_NCG)

Updated cabal.project.stage1 to use new flags:
  +x86-ncg +aarch64-ncg -ppc-ncg -riscv64-ncg -loongarch64-ncg
  -js-backend -wasm-backend +llvm-backend
  -interpreter -dynamic-linker

Stage1 now builds 801 modules (vs 885 for full build), excluding:
- ByteCode interpreter modules
- JavaScript backend (StgToJS)
- WebAssembly backend
- Uncommon NCGs (PPC, RISC-V, LoongArch64)
- GHCi/interpreter runtime support
…dant guards

This commit reduces CPP spread in the GHC NCG codebase through two approaches:

**Phase 1: Remove redundant CPP from conditional modules**

Modules that are only compiled when a cabal flag is enabled (e.g.,
`flag(interpreter)`) don't need internal CPP guards checking that same flag.

- GHC/ByteCode/Linker.hs: Remove #else stub block (module is already
  conditional via ghc.cabal.in)
- GHC/Runtime/Eval.hs: Remove 3 CPP import guards
- GHC/Runtime/Loader.hs: Remove CPP import guard

**Phase 2: Record-based dispatch pattern for NCG**

Replaced repetitive CPP in Target.hs and FreeRegs.hs with a cleaner
record-based dispatch pattern:

- GHC/CmmToAsm/Reg/Target.hs: Major refactor
  - Created `RegTarget` record bundling all 5 register operations
  - Each architecture defines its RegTarget in ONE CPP block (was 5)
  - Single `selectRegTarget :: Platform -> RegTarget` dispatch function
  - Exported functions are simple wrappers with no CPP
  - Reduces CPP blocks from ~25 to ~5 (one per arch)

- GHC/CmmToAsm/Reg/Linear/FreeRegs.hs: Applied same pattern to maxSpillSlots
  - Single `selectMaxSpillSlots` dispatch function
  - Cleaner unavailable arch handling

This pattern consolidates CPP to:
1. Conditional imports (required - modules are conditionally compiled)
2. Conditional arch-specific record definitions (one block per arch)
3. Single dispatch function with conditional entries

Net result: 59 fewer lines, cleaner code, easier to add/remove architectures.
Create GHC.Runtime.Interpreter.Stubs module containing all stub type
definitions for builds without interpreter support. This eliminates
duplication of stub types across 6+ files.

Centralized types include:
- Core GHCi types: HValue, ForeignRef, ForeignHValue, RemoteRef, RemotePtr, HValueRef
- Communication types: Pipe, LoadedDLL
- Break/debug types: BreakArray, InternalBreakpointId
- Template Haskell types: THMessage, THResultType
- Eval types: EvalExpr, ResumeContext, EvalStep
- StgToJS types: LinkPlan, StgToJSConfig
- Linker environment types: ItblEnv, AddrEnv

Updated files to import from centralized module:
- GHC/Linker/Types.hs
- GHC/Runtime/Eval/Types.hs
- GHC/Runtime/Interpreter/Types.hs
- GHC/Driver/Hooks.hs
- GHC/Hs/Expr.hs
- GHC/Tc/Gen/Splice.hs

The Stubs module is conditionally compiled via ghc.cabal.in only when
the interpreter flag is disabled (!flag(interpreter)).

This reduces code duplication and provides a single source of truth
for all stub type definitions in minimal/non-interpreter builds.
The file uses #if defined(HAVE_INTERNAL_INTERPRETER) guards but was
missing the {-# LANGUAGE CPP #-} pragma, causing a parse error.

While the module is only compiled when flag(interpreter) is enabled,
it still needs CPP for the more specific HAVE_INTERNAL_INTERPRETER
guards within the module.
Safe Haskell import checking (hscCheckSafeImports) is a compile-time
feature that should work regardless of interpreter support. The previous
CPP guard `#if !defined(HAVE_INTERPRETER)` caused stage1 builds (which
use -interpreter flag) to skip Safe Haskell checking when compiling
boot libraries.

This resulted in incorrect Safe Haskell metadata in .hi files, causing
testsuite failures for Safe Haskell tests (BadImport, Dep, ImpSafe,
safePkg01, etc.) because ghc-internal modules compiled by stage1 had
different safety inferences than expected.

The fix removes the CPP guard so hscCheckSafeImports always runs,
ensuring consistent Safe Haskell behavior regardless of whether the
compiler has interpreter support enabled.
This commit addresses all review comments from PR 118:

1. Fix HAVE_INTERPRETER vs HAVE_JS_BACKEND mismatch (Critical)
   - Split CPP guards: HAVE_INTERPRETER for bytecode/interpreter
   - Use HAVE_JS_BACKEND for StgToJS modules
   - Affected files: GHC/Driver/Main.hs, Pipeline.hs, Pipeline/Execute.hs

2. Clean up "MINIMAL build" references
   - Replace with explicit CPP flag references
   - e.g., "HAVE_INTERPRETER not defined" instead of "MINIMAL build"
   - Affected files: GHC.hs, Driver/Main.hs, Driver/Make.hs,
     Driver/Session/Inspect.hs, Driver/Pipeline/Execute.hs,
     Tc/Types.hs, Tc/Gen/Splice.hs

3. Refactor RegTarget to GADT-based approach
   - Introduce ArchKind data type for type-level architecture tags
   - Use GADT RegTarget with architecture-specific constructors
   - Add SomeRegTarget existential wrapper for runtime dispatch
   - Define RegOps typeclass with per-architecture instances
   - Add withRegTarget helper using rank-2 types
   - Add INLINE pragmas for performance
The GADT refactoring of Reg/Target exports RegOps(..) which includes
mkVirtualReg as a typeclass method. PPC/CodeGen.hs imports both PPC.Regs
(which has its own mkVirtualReg) and Reg.Target, creating an ambiguous
reference.

Fix by using an explicit import list for Reg.Target, importing only
targetClassOfReg which is the only function actually used.
- Fix JS linker CPP guards: use HAVE_JS_BACKEND instead of HAVE_INTERPRETER
  for linkJSBinary (lines 456, 474, 603 in Pipeline.hs). Linking JS
  binaries doesn't require the interpreter.

- Revert formatting change in Reg/Target.hs:298 to avoid unnecessary
  whitespace changes in an already hard-to-rebase patch.
ghc-boot depends on ghc-platform, so it must be included in the
stage1 build. This was incorrectly removed during rebase conflict
resolution.
Split the monolithic GHC.Hs.Instances module (344 Data instances, 14MB
object file) into 5 parallel-compilable sub-modules:

  - GHC.Hs.Instances.Common      - phase-independent instances
  - GHC.Hs.Instances.Transitions - LR types spanning multiple phases
  - GHC.Hs.Instances.Parsed      - GhcPs instances only
  - GHC.Hs.Instances.Renamed     - GhcRn instances only
  - GHC.Hs.Instances.Typechecked - GhcTc instances only

The wrapper module GHC.Hs.Instances re-exports all sub-modules for
backward compatibility.

Additionally, all Instance modules are now excluded from stage1 builds
(via -interpreter flag) since Data instances are only needed at runtime
for Template Haskell and GHCi, not for compilation.

Benefits:
- Stage1: Saves ~14MB object file + compile time
- Stage2: 5 modules can compile in parallel instead of 1 serial

See #9557 for why these use -O0, #18254 for the parallelism issue.
These modules were added in PR #123 (Build iserv on demand) but were
missing from the interpreter section of ghc.cabal.in after the rebase.
This module was incorrectly added during the rebase - the source file
doesn't exist in either the base branch or our PR.
The deriving instance for Data (HsModule GhcPs) requires Data instances
for HsDecl and other AST types, which are now in the GHC.Hs.Instances.*
modules. Since those modules are only included when the interpreter flag
is enabled, we must also guard this deriving instance.

This fixes the stage1 build failure:
  No instance for 'Data (HsDecl GhcPs)' arising from a use of 'k'
  In the instance declaration for 'Data (HsModule GhcPs)'
The Instance modules provide Data instances that are used throughout
the compiler (e.g., GHC.Iface.Ext.Ast for HIE file generation). They
cannot be conditional on the interpreter flag.

The split into multiple modules was for parallel compilation (#18254),
not for stage1 exclusion. Keep them unconditional so all dependent
modules can compile.

Reverts the conditional placement from the previous commit.
The split Instance modules were missing dependencies on each other:

- Common.hs: Added `import GHC.Hs.Type` for HsTypeGhcPsExt type
- Parsed.hs, Renamed.hs, Typechecked.hs: Added imports from Common
  (for HsFieldBind, HsTypeGhcPsExt instances) and Transitions (for
  StmtLR instances)

This establishes the correct dependency chain:
  Common -> Transitions -> Parsed/Renamed/Typechecked -> Instances

Fixes build errors:
- "No instance for 'Data (StmtLR GhcPs GhcPs ...)'
- "No instance for 'Data (HsFieldBind ...)'
- "Not in scope: type constructor or class 'HsTypeGhcPsExt'"
…R dependencies

The NHsValBindsLR type contains a hard-coded [LSig GhcRn] field, creating
an unavoidable dependency: HsExpr GhcPs -> HsLocalBinds -> NHsValBindsLR
-> [LSig GhcRn]. This means many GhcPs Data instances must be in Renamed.hs.

Changes:
- Move all HsLocalBinds-dependent GhcPs instances from Parsed.hs to Renamed.hs
- Move LR instances (HsLocalBindsLR, HsValBindsLR, etc.) to phase-specific modules
- Empty Transitions.hs as all instances now belong in their target phase module
- Reorganize Common.hs to only contain truly phase-independent instances

The split now follows a clear dependency hierarchy:
  Common.hs (no phase deps)
    -> Parsed.hs (GhcPs types without HsLocalBinds dependencies)
    -> Renamed.hs (GhcRn + all GhcPs types depending on HsLocalBinds/GhcRn)
    -> Typechecked.hs (GhcTc types)

This resolves #18254 build failures while maintaining parallel compilation benefits.
- Add stub object-code linker functions to GHC.Runtime.Interpreter
  that panic when called (initObjLinker, lookupSymbol, loadDLL, etc.)
- Fix InterpSymbol import to use GHC.Runtime.Interpreter.Types.SymbolCache
- Fix InterpSymbol type parameter in lookupSymbol/lookupClosure signatures
- Change resolveObjs return type from IO Bool to IO SuccessFlag
- Fix loadDLL return type to IO (Either String (RemotePtr LoadedDLL))
- Fix FixitySig import in GHC.Hs.Instances.Parsed to use source module
  (Language.Haskell.Syntax.Binds) for constructor visibility

These changes allow stage1 builds without interpreter support to compile.
Make the entire module conditional on HAVE_INTERPRETER:
- When HAVE_INTERPRETER is defined: use full implementation with all
  interpreter types (WasmInterpConfig, JSInterpConfig, etc.)
- When HAVE_INTERPRETER is not defined: provide stub InterpOpts type
  and initInterpreter that always returns Nothing

This resolves build errors:
- Ambiguous StgToJSConfig (stub vs real type conflict)
- Missing WasmInterpConfig and JSInterpConfig types
- Missing interpreter configuration field names
When building without interpreter support (-interpreter flag),
InterpOpts is a stub empty data type without any fields. The
initInterpOpts function was trying to use all 20+ InterpOpts fields
unconditionally, causing build failures.

This fix makes the entire module conditional on HAVE_INTERPRETER:
- When defined: full implementation creating InterpOpts with all fields
- When not defined: stub that returns empty InterpOpts

This allows stage1 minimal builds (without interpreter) to complete
successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants