Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
231 changes: 231 additions & 0 deletions docs/design_decisions/DR-001-infra-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
<!--
Copyright (c) 2025 Contributors to the Eclipse Foundation

See the NOTICE file(s) distributed with this work for additional
information regarding copyright ownership.

This program and the accompanying materials are made available under the
terms of the Apache License Version 2.0 which is available at
https://www.apache.org/licenses/LICENSE-2.0

SPDX-License-Identifier: Apache-2.0
-->


# DR-001-B-Infra-Extension: S-CORE Build Execution Contract


* **Date:** 2026-01-20

```{dec_rec} S-CORE Build Execution Contract
:id: dec_rec__infra__execution_contract
:status: accepted
:context: Infrastructure
:decision: Adopt a layered execution contract

```

---

## Purpose

This document defines the execution contract for S-CORE builds across developer machines
and CI infrastructure.
Its goal is to ensure **long-term reproducibility (≥10 years)**, **traceability** and
**practical hermeticity**, despite changes in underlying infrastructure such as
GitHub-hosted runners.

It builds on [DR-001], which was concerned about the same topics, but was focused on tools only. This adds details where the original description was too fuzzy.

The contract is intentionally **layered**, because different parts of the system control
different capabilities and failure modes.

---

## Core Requirements

### R1 — Long-Term Reproducibility
S-CORE builds must be reproducible for **at least 10 years** after creation, given:
- the source revision
- archived execution context
- archived toolchains
- recorded build metadata

This must remain possible even if:
- GitHub runner images change or are retired
- upstream toolchains are no longer available
- external services are unavailable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a test build, which has no access to the internet? This way we can be sure all needed tools and code are present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would only work IF we would have a remote repository cached that is populated for Bazel fetching or we configure our own local mirrors.

Or?

How would you see that? For the devcontainer part I think it is straightforward, for Bazel, I'm not too sure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bazel mirror would also need to act like an archive, because it should not remove anything needed to build a release for +10 years. And I guess all S-CORE builds should ideally make use of that mirror.

btw. in addition to bazel dependencies being downloaded from the internet, I expect that Rust code might also download some crates from crates.io.

If the needed dependencies are stored within the devcontainer, it would be rather easy, because you need the devcontainer image from the release and the S-CORE code and you should be able to build. No internet access or mirror needed.


---

### R2 — Traceability
For every build artifact, it must be possible to determine **exactly**:
- which sources were used
- which toolchains and tool versions were used
- which Bazel version and flags were used
- which execution context (container image) was used
- which host baseline constraints applied

This information must be recorded in a **build manifest** and stored alongside the build
outputs.

---

### R3 — Hermeticity (Practical)
Build actions must not depend on **undeclared inputs**.

In practice:
- Tools affecting build outputs must either be:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explain "Tools affecting build outputs" to be very clear. With example of such tools. And tools that are less relevant.

e.g. I'm currently not sure whether pytest affects build outputs. Are test results build output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test results are not build outputs, but they do affect CI decisions ( I guess quality gates) so the tools that produce them must still be Bazel-visible if you want correctness, reproducibility and traceability.

Theoretically we should use Bazel Pythong rules ( rules_python). This would ensure:

  • Reproducible test outcomes
  • Correct test caching

But I guess it is an acceptable fallback to have Pytest installed in the devcontainer.

Do we have a file like a fingerprint.txt which contains the versions of the tools?

pytest==7.4.2
python==3.11.6
container=sha256:deadbeeeeeeeeef.......

That one can be used as an input for the test action. That way Bazel sees when the fingeprint changes and it can invalidate the caches accordingly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- managed by Bazel, or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writing here suggests that if it's managed by bazel it then automatically is solved and we don't need to worry about it anymore.
Is that the case, or am I miss-reading here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have doubts about that. bazelisk downloads even bazel from the Internet, which then might download more dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we for sure would need to have a test build for official releases that checks if the archiving is complete by building it without access to the internet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaximilianSoerenPollak @lurtz You are both correct

Maybe we should explicitly mention

“Bazel-managed” improves cache correctness and traceability, but does not by itself
guarantee long-term reproducibility
; artifact availability must also be ensured
via pinning, checksums, internal mirroring/archiving and offline verification.

Ok, we do have our own Bazel Container Registry, but it will be probably a good idea to mirror the other artefacts that Bazel/bazeliks downloads ( even Bazel itsself, it can be that in 10 years, version 7.1.0 (random example) won't be available anymore for download).

- explicitly injected as Bazel action inputs, or
- reflected in cache partitioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean here. Theoretically its enough if they are documented as in R2, but of course we want more.

What about "mirrorable"... no idea how to describe it. I'm talking about pypi for example.

- Reliance on host state must be minimized and documented where unavoidable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same indentation as above. So "Tools affecting build outputs must either be...documented" is missing?


Perfect hermeticity is not required, but **undeclared variability is not acceptable**.

---

## Three-Layer Execution Contract

### Layer 1 — Host Platform Contract
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x86, arm for Macs would also belong to host?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I've no idea what's the correct list of supported archs for hosts. Who could help clarify this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least devcontainers we build for both arm64 and x86_64. I would say at the moment that pretty much covers all hosts. I successfully ran devcontainer on an Apple M3 (arm64) MacOS, Windows WSL2 (arm64 (Snapdragon Laptop) and x86_64), Linux (arm64 (Snapdragon Laptop) and x86_64). I cannot imagine what else we need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: ran the devcontainer, started the tools - not: successfully built S-CORE. The issue here is: the Bazelized tools are not all available for arm64.


This layer defines the **non-virtualized constraints** imposed by the machine running
the build.

#### Scope
- GitHub-hosted runners
- self-hosted runners (VM or bare metal)
- kernel-level features shared with containers

#### Responsibilities
- Linux kernel version and configuration
- Security mechanisms (AppArmor / LSM)
- Filesystems, networking, namespaces
- Support for:
- Bazel `linux-sandbox`
- QEMU / `binfmt`
- privileged operations where required

#### Requirements
- Linux host OS (Ubuntu LTS for reference integration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an endless discussion going on for host platforms. Let's not get involved.

Suggested change
- Linux host OS (Ubuntu LTS for reference integration)
- Linux host OS (For now Ubuntu LTS for reference integration)

- Kernel must support:
- unprivileged user namespaces
- mount operations required by Bazel sandboxing
- Host security policies must not block Bazel `linux-sandbox` unless explicitly documented

#### Known Constraints
- Ubuntu 24.04+ AppArmor may block Bazel sandbox mount operations
- Containers **cannot** mitigate host-kernel restrictions

#### Policy
- A **Reference Integration Host Baseline** must be defined (e.g. Ubuntu 22.04).
- Deviations (e.g. privileged runners, sandbox disabled) must be explicit and isolated.

---

### Layer 2 — Execution Context Contract (Devcontainer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already assume builds are executed in a devcontainer? So far we only agreed that the development environment is in devcontainers? Maybe this layer is optional?

Suggested change
### Layer 2 — Execution Context Contract (Devcontainer)
### Layer 2 — Execution Context Contract (e.g. native or devcontainer)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the last point says its optional. Thats quite late ;-)


This layer defines the **default user-space environment** in which builds are executed.

#### Purpose
- Provide consistent runtime ABI (`glibc`, `libstdc++`)
- Ensure tool binaries (e.g. rustc) can execute reliably
- Eliminate “works on my machine” discrepancies
- Enable local reproduction of CI builds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out what exactly has to be identical. Or is it this list?
e.g. devcontainer is based on Ubuntu 24. GitHub Runners are based on Ubuntu 24. Is it now enough to ensure the same python version for python scripts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can specify at Github Action Workflows that a specific container image and version shall be used: https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax#jobsjob_idcontainerimage

Then there is no fingers crossed anymore if CI and devcontainer have the same tools and versions installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lurtz correct
I guess it's time we start converting our workflows into using the devcontainers
Not sure if there's something in particular needed ( in terms of configuration of the container) for the Bazel local-cache.

@AlexanderLanin if we go ahead with using devcontainers, we can make the clean-up action even more aggresive


#### Definition
- A **versioned devcontainer image** is the default execution context for CI and local builds.
- The container image must be:
- built from a **defined Ubuntu LTS baseline**
- compatible with common developer tooling (e.g. by following the [specification](https://containers.dev/))
- referenced by an **immutable image digest**
- archived for **long-term reproducibility**

#### Baseline Preservation and Reproducibility
- Once a devcontainer image is used in CI, its image digest becomes part of the build provenance
- All such images must be archived and retrievable for **at least 10 years**
- Reproducing historical builds may rely on legacy container runtimes or CLI-only execution,
and does not require continued IDE support

#### Responsibilities
- User-space runtime libraries
- Bootstrap tooling (git, bash, coreutils, python, etc.)
- Bazel entrypoint (preferably Bazelisk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit puzzled. IIRC Bazelisk transparently downloads whatever bazel version has been specified via .bazelversion from the internet. How will this ensure stable builds in like 10+ years? We have no guarantee that the server address is still the same and the version still available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point again!

Bazelisk is fine as an entrypoint, but only if we pair it with an archived, controller source of Bazel binaries.

Would that mean to create our own internal mirror for Bazel releases (within eclipse).

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to be open, that I do not like all of bazels concepts. It is amazing for tracking dependencies, caching and its best effort sandboxing when building and running tests. But I would rather not let it download itself, tools or toolchains and store these within the devcontainer image. However this is against the point you made about having the devcontainer optional.

That being said you also stated that the entire S-CORE should use a single bazel version:

- S-CORE uses a **single Bazel version** across repositories.

To me the best solution would be to include exactly THIS bazel version in the devcontainer instead of using bazelisk.

Maybe we should discuss how much infrastructure (bazel mirror, bazel archive, devcontainer registry) we want to build, or if we want to create a design which needs less infrastructure. I lean towards a solution, which fulfills all requirements, but needs as less infrastructure in the background as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fair, but at the moment all the repos have their own .bazelversion file, or? example : https://github.com/eclipse-score/communication/blob/main/.bazelversion

Wouldn't that override whatever we set in the devcontainer ?

We'd have to set-up an enforcing mechanism? or is there such already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO there are multiple ways to achieve this inside the devcontainer.

  • You install a specific bazel version in the devcontainer and set an environment variable, which overrides any .bazelversion file. However this happens silently
  • We could also not ship bazelisk in the devcontainer, then the build will fail, if the .bazelversion does not match the version of the preinstalled bazel binary.

- Development UX tooling (optional)

#### Non-Goals
- The devcontainer must **not silently override** repository-declared Bazel versions.
- The devcontainer must **not be the only place** where critical tool versions are defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where else would you define it then? In a global ledger or so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a global ledger — the source of truth should be the repo (ex., .bazelversion, MODULE.bazel/lockfile, pinned toolchain deps). The devcontainer may provide binaries but it must not be the only place where versions are defined, otherwise changes to the container silently change builds. Guess this is part of the 3-layered approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weeeelll one can see this the other way around: the devcontainer defines the environment, also Bazelversion.

BUT

Changing the devcontainer in a repository is not silent. It is an explicit PR, with a version change of the container. That change must do a build & test of the repository content using the updated container. If that builds - all good, right? If not --> PR fails, investigation required. No blocking of development or surprises at any point in time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise changes to the container silently change builds

This can only happen if no fixed revisions are used. E.g. ghcr.io/eclipse-score/devcontainer:latest will silently change, but ghcr.io/eclipse-score/devcontainer:v1.1.0 will always stay the same.


#### Policy
- The devcontainer defines the **default** environment, not the **only** supported one.
- Builds should still be possible on compatible bare-metal hosts.

---

### Layer 3 — Bazel Contract

This layer defines **what Bazel controls and guarantees**.

#### Bazel Versioning
- Each relevant repository must contain `.bazelversion`.
- S-CORE uses a **single Bazel version** across repositories.
- CI enforces version consistency.

#### Toolchains and Tools
- Toolchains (e.g. Rust/Ferrocene, C/C++) must be:
- versioned
- immutable
- built against a documented baseline
- Tools affecting outputs must be known to Bazel or reflected in action inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outputs in which way?
Does this mean for example if something saves a .json that is used as cache it should only work via Bazel actions?
Or what about the test frameworks that can affect the output xml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Outputs” should mean any build- or CI-relevant result that we rely on, not just binaries.

  1. Build artifacts (most important)

Examples:

  • binaries, libraries, containers, packages
  • generated source code checked-in or shipped
  • compiled outputs used downstream
  1. Decision artifacts (CI gating) - I know that we don't have them, but I guess we need to add them at some point 😄

Examples:

  • test pass/fail outcome
  • coverage percentage used as a gate
  • lint results used to pass/fail

If a tool can change these → Bazel must track the tool version / inputs (or we risk wrong decisions / wrong cached results). Yeah, yeah, I know, where's the cache ?


#### Hermeticity Guarantees
- Bazel sandboxing provides reproducibility **given runnable tools**.
- Bazel does **not** virtualize:
- kernel
- `glibc`
- host security configuration

These constraints must be handled in Layer 1 and Layer 2.

---

## Minimum Supported Baselines

### OS and Runtime Baseline
- Minimum supported baseline: **Ubuntu 20.04 LTS** (subject to revision)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Baseline is defined in layer 2. So layer 3 should not mention an exact version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is relevant for the underlying OS/ kernel version/ glibc

- Toolchains must be built against this baseline
- Older environments are **not supported**

### Rationale
We explicitly do **not** support all historical `glibc` or kernel versions.
Portability is achieved by choosing and documenting a baseline, not by unlimited
backward compatibility. Layer 2 can easily be virtualized as needed, for future reproducibility.

---

## Build Provenance and Archiving

Each CI build must produce and archive:
- build manifest (metadata)
- container image digest
- toolchain identifiers
- source revision(s)

These artifacts form the basis for:
- long-term reproducibility
- forensic analysis
- compliance and auditing

---

## Summary

- **Layer 1** defines what the host *must* provide.
- **Layer 2** defines the default execution environment.
- **Layer 3** defines how Bazel achieves reproducibility and caching.
- Reproducibility, traceability, and hermeticity are enforced through
**explicit contracts**, not assumptions.

This separation allows S-CORE to scale infrastructure, evolve toolchains, and still
reproduce builds years into the future.