diff --git a/doc/plans/oci-sealing-impl.md b/doc/plans/oci-sealing-impl.md deleted file mode 100644 index dea523c9..00000000 --- a/doc/plans/oci-sealing-impl.md +++ /dev/null @@ -1,210 +0,0 @@ -# OCI Sealing Implementation in composefs-rs - -This document describes the implementation of OCI sealing in composefs-rs. For the generic specification applicable to any composefs implementation, see [oci-sealing-spec.md](oci-sealing-spec.md). - - - -## Current Implementation Status - -### What Exists - -The `composefs-oci` crate at `crates/composefs-oci/src/image.rs` already implements the core sealing mechanism. The `seal()` function computes the fsverity digest via `compute_image_id()`, creates an EROFS image from merged layers with whiteouts applied, and stores the digest in `config.labels["containers.composefs.fsverity"]`. A new config with updated labels is written via `write_config()`, returning both the SHA256 config digest and fsverity image digest. - -The implementation includes fsverity computation and verification through the `composefs` crate's fsverity module. Config label storage follows the OCI specification with digest mapping from SHA256 to fsverity maintained in split streams. Repository-level integrity verification is provided through `check_stream()` and `check_image()`. Mount operations check for the seal label and use fsverity verification when present. - -All objects in the repository are fsverity-enabled by default, with digests stored using the generic `ObjectID` type parameterized over `FsVerityHashValue`. Images are tracked separately in the `images/` directory, distinct from general objects due to the kernel security model that restricts non-root filesystem mounting. - -### Current Workflow - -The sealing workflow in composefs-rs begins with `create_filesystem()` building the filesystem from OCI layers. Layer tar streams are imported via `import_layer()`, converting them to composefs split streams. Files 64 bytes or smaller are stored inline in the split stream, while larger files are stored in the object store with fsverity digests. Layers are processed in order, applying overlayfs semantics including whiteout handling (`.wh.` files). Hardlinks are tracked properly across layers to maintain filesystem semantics. - -After building the filesystem, `compute_image_id()` generates the EROFS image and computes its fsverity digest. The digest is stored in the config label `containers.composefs.fsverity`. The `write_config()` function writes the new config to the repository with the digest mapping, and both the SHA256 config digest and fsverity image digest are returned. - -For mounting, the `mount()` operation requires the `containers.composefs.fsverity` label to be present. It extracts the image ID from the label and mounts at the specified path with kernel fsverity verification. - -## Repository Architecture - -The composefs-rs repository architecture at `crates/composefs/src/repository.rs` supports sealing without major changes. Objects are stored in a content-addressed layout under `objects/XX/YYY...` where `XX` is the first byte of the fsverity digest and `YYY` are the remaining 62 hex characters. All files in `objects/` must have fsverity enabled, enforced via `ensure_verity_equal()`. - -Images are tracked separately in the `images/` directory as symlinks to objects, with refs providing named references and garbage collection roots. Split streams are stored in the `streams/` directory, also as symlinks to objects. The repository has an "insecure" mode for development without fsverity filesystem support, but sealing operations should explicitly fail in this mode. - -Two-level naming allows access by fsverity digest (verified) or by ref name (unverified). The `ensure_stream()` method provides idempotent stream creation with SHA256-based deduplication. Streams can reference other streams via digest maps stored in split stream headers, enabling the layer→config relationship tracking. - -## Required Enhancements - -### Manifest Annotations - -Manifest annotations should be added to indicate sealed images and enable discovery without parsing configs. The sealing operation should add `containers.composefs.sealed` set to `"true"` and optionally `containers.composefs.image.fsverity` containing the image digest. This allows registries to discover sealed images and clients to optimize pull strategies. - -### Per-Layer Digest Annotations - -Per-layer digests enable incremental verification and caching. A `SealedImageInfo` structure should track the image fsverity digest, config SHA256 digest, optional config fsverity digest, and a list of layer seal information. Each `LayerSealInfo` entry should contain the original tar layer digest, the composefs fsverity of the layer, and the split stream digest in the repository. - -During sealing, layer descriptors should be annotated with `containers.composefs.layer.fsverity` after processing each layer. This allows verification of individual layers before merging and enables caching where shared layers have known composefs digests. - -### Verification API - -A standalone verification API separate from mounting should be implemented. The verification function should check manifest annotations for the seal flag, fetch and verify the config against the manifest's config descriptor, extract the fsverity digest from the config label, verify annotated layers if present, and optionally verify the image exists in the repository. - -This enables verification before mounting and provides detailed seal information without building the filesystem. The returned `SealedImageInfo` structure contains all digest relationships and layer details. - -### Pull Integration - -The `pull()` function in `crates/composefs-oci/src/image.rs` should be enhanced to handle sealed images. When a verify_seal flag is enabled, the pull operation should check manifest annotations for the sealed flag and verify the seal during pull if present. If the image is sealed and verification passes, some integrity checks can be skipped since the composefs digests are trusted. - -An optimization is that sealed images don't require re-computing digests during import if verification already passed. The pull result should include optional seal information alongside the manifest and config. - -### Push Integration - -Support for pushing sealed images back to registries requires preserving seal annotations through the registry round-trip. The push operation should construct the manifest with seal annotations, push the config with the composefs label, push layers optionally with layer annotations, and push the manifest with seal annotations. - -The challenge is maintaining digest mappings through the registry round-trip, as registries may re-compress or re-package layers while preserving content digests. - -### Insecure Mode Handling - -Repository sealing operations should explicitly fail when the repository is in insecure mode. The rationale is that if the repository doesn't enforce fsverity, sealing provides no security benefit. The check should be performed at the beginning of seal operations, returning an error if `repo.is_insecure()` is true. - -## Implementation Phases - -### Phase 1: Core Sealing (Completed) - -Phase 1 is complete with basic `seal()` implementation in `composefs-oci`, fsverity computation and storage, config label with digest, and mount with seal verification. - -### Phase 2: Manifest Annotations (Planned) - -Phase 2 will add manifest annotation support to `seal()`, create the `SealedImageInfo` type, implement the `verify_seal()` API, document the label/annotation schema, and add tests for sealed image workflows. - -Deliverables include `seal()` emitting manifests with annotations, standalone verification without mounting, and updated documentation in `doc/oci.md`. - -### Phase 3: Per-Layer Digests (Planned) - -Phase 3 will record per-layer fsverity during sealing, add layer annotations to manifests, implement incremental verification, and optimize pull for sealed images. - -Deliverables include full `SealedImageInfo` with layer details, layer-by-layer verification API, and performance improvements for sealed pulls. - -### Phase 4: Push/Registry Integration (Planned) - -Phase 4 will implement push support for sealed images, preserve annotations through registry round-trip, test with standard OCI registries, and document registry compatibility. - -Deliverables include bidirectional registry support, a registry compatibility matrix, and integration tests with real registries. - -### Phase 5: Advanced Features (Future) - -Future work includes dumpfile digest support, eager/lazy verification modes, zstd:chunked integration, the three-digest model, and signature integration. - -## API Design Considerations - -### Type Safety - -The generic `ObjectID` type parameterized over `FsVerityHashValue` provides type safety for digest handling. Both `Sha256HashValue` and `Sha512HashValue` implement the `FsVerityHashValue` trait with hex encoding/decoding, object pathname format, and algorithm ID constants. - -### Async/Await - -Operations like `seal()` and `pull()` are async to support parallel layer fetching with semaphore-based concurrency control. The repository is wrapped in `Arc` to enable sharing across async contexts. - -### Error Handling - -The codebase uses `anyhow::Result` for error handling with context. Seal operations should provide clear error messages distinguishing between fsverity failures, missing labels, and repository integrity issues. - -### Verification Modes - -Supporting both eager and lazy verification requires a configuration option, potentially as an enum `SealVerificationMode` with variants `Eager`, `Lazy`, and `Never`. Different defaults may apply for user versus system repositories. - -## Integration Points - -### Split Streams - -Split streams at `crates/composefs/src/splitstream.rs` are the intermediate format between OCI tar layers and composefs EROFS images. They contain inline data for small files and references to objects for large files. Split stream headers include digest maps linking SHA256 layer digests to fsverity digests. - -Per-layer sealing should leverage split streams to maintain the digest mapping. The split stream format doesn't need changes but seal metadata should reference split stream digests. - -### EROFS Generation - -EROFS image generation via `mkfs_erofs()` in `crates/composefs/src/erofs/` creates reproducible images from filesystem trees. The EROFS writer handles inline data, shared data, and metadata blocks with deterministic layout. The same input filesystem produces the same EROFS digest. - -Sealing relies on this determinism for verification. The EROFS format version may evolve, which is why dumpfile digests are being considered as a format-agnostic alternative. - -### Fsverity Module - -The fsverity module at `crates/composefs/src/fsverity/` provides userspace computation matching kernel behavior and ioctl wrappers for kernel interaction. Digest computation uses a hardcoded 4096-byte block size with no salt support, matching kernel fs-verity defaults. - -Sealing uses `compute_verity()` for userspace digest computation during EROFS generation and `enable_verity_maybe_copy()` to handle ETXTBSY by copying files if needed. Verification uses `measure_verity()` to get kernel-measured digests and `ensure_verity_equal()` to compare against expected values. - -## Open Implementation Questions - -### Config Annotation Method - -The current code calls `config.get_config_annotation()` which actually reads from labels, not annotations. This naming suggests potential confusion between OCI label and annotation semantics. Clarification is needed whether storing in labels is intentional or if annotations should be used for the digest. - -### Sealed Config Mutability - -Sealing modifies config content by adding the label, creating a new SHA256 for the config and breaking existing references to the old config digest. This may be acceptable since the sealed config is a new artifact, but it needs clear documentation about the relationship between sealed and unsealed images. - -### Performance at Scale - -Computing fsverity for large images is expensive as `compute_image_id()` builds the entire EROFS in memory. Streaming approaches or caching strategies should be considered for multi-GB images. The EROFS writer could be enhanced to support streaming output with incremental digest computation. - -### Seal Metadata Persistence - -Optionally persisting `SealedImageInfo` as `.seal.json` alongside images in the repository could enable faster seal information retrieval without re-parsing configs. This metadata cache would need invalidation strategies and shouldn't be security-critical. - -### Repository Ref Strategy - -Sealed images have different config digests than unsealed images. The ref strategy for managing variants should avoid keeping both sealed and unsealed versions indefinitely. Garbage collection should understand the relationship between sealed and unsealed images, potentially tracking seal derivation relationships. - -## Testing Strategy - -Testing should cover sealing unsealed images and verifying the config label is added correctly with the expected fsverity digest. Mounting sealed images should verify that fsverity is checked by the kernel. Verification API tests should check correct extraction of seal information from manifest and config. - -Per-layer annotation tests should verify layer digests are computed and annotated correctly. Pull integration tests should verify detection and verification of sealed images during pull. Push integration tests should verify seal metadata is preserved through registry round-trip. - -Negative tests should verify that seal operations fail in insecure mode, mounting fails with incorrect fsverity digest, and verification fails with missing or incorrect labels. - -Performance tests should measure sealing time for various image sizes and verify parallel layer processing performance. - -## Compatibility Considerations - -### OCI Registry Compatibility - -Standard OCI registries should store and serve sealed images without special handling. Unknown labels and annotations are preserved by spec-compliant registries. Testing should verify round-trip through common registries like Docker Hub, Quay, and GitHub Container Registry. - -### Existing Composefs-rs Versions - -The seal format version label enables detection of format changes. Forward compatibility means newer implementations can read older seals. Backward compatibility means older implementations should gracefully ignore newer seal formats they don't understand. - -### C Composefs Compatibility - -While composefs-rs aims to become the reference implementation, compatibility with the C composefs implementation should be maintained where feasible. EROFS images and dumpfiles should be interchangeable. Digest computation must match exactly between implementations. - -## Future Implementation Work - -### Dumpfile Digest Support - -Supporting dumpfile digests requires adding `containers.composefs.dumpfile.sha256` label computation during sealing. Verification should support parsing EROFS back to dumpfile format and verifying the digest. Caching the dumpfile→fsverity mapping requires careful security consideration to avoid cache poisoning. - -### zstd:chunked Integration - -Integration with zstd:chunked requires reading and writing TOC metadata with fsverity digests added to entries. The TOC format from the estargz/stargz-snapshotter projects would need extension for fsverity. Direct TOC→dumpfile conversion would enable unified metadata handling. - -### Non-Root Mounting Helper - -A separate composefs-mount-helper service would accept dumpfiles from unprivileged users, generate EROFS images, validate fsverity, and return mount file descriptors. This requires privileged service implementation with careful input validation on the dumpfile format. - -### Signature Integration - -Integrating with cosign or sigstore requires fetching and verifying signatures during pull, associating signatures with sealed images in the repository, and potentially storing signature references in seal metadata. The signature verification should happen before seal verification in the trust chain. - -## References - -See [oci-sealing-spec.md](oci-sealing-spec.md) for the generic specification and complete reference list. - -**Implementation references**: -- `crates/composefs-oci/src/image.rs` - OCI image operations including seal() -- `crates/composefs/src/repository.rs` - Repository management -- `crates/composefs/src/fsverity/` - Fsverity computation and verification -- `crates/composefs/src/splitstream.rs` - Split stream format -- `crates/composefs/src/erofs/` - EROFS generation - -**Related composefs-rs issues**: -- Check for existing issues about OCI sealing enhancements -- File new issues for specific implementation work items diff --git a/doc/plans/oci-sealing-spec.md b/doc/plans/oci-sealing-spec.md index 98d000bf..31792f84 100644 --- a/doc/plans/oci-sealing-spec.md +++ b/doc/plans/oci-sealing-spec.md @@ -8,164 +8,521 @@ Container images need cryptographic verification that efficiently covers the ent Hence verifying the integrity of an individual file would require re-synthesizing the entire tarball (using tar-split or equivalent) and computing its digest. +## Related projects + +- **[containerd EROFS snapshotter](https://github.com/containerd/containerd/blob/main/docs/snapshotters/erofs.md)**: Converts OCI layers to EROFS blobs with optional fsverity protection. Supports `enable_fsverity = true` to enable fs-verity on layer blobs. Uses reproducible builds with erofs-utils 1.8+ (`-T0 --mkfs-time`). dm-verity integration is planned but not yet implemented. + ## Solution The core primitive of composefs is fsverity, which allows incremental online verification of individual files. The complete filesystem tree metadata is itself stored as a file which can be verified in the same way. The critical design question is how to embed the composefs digest within OCI image metadata such that external signatures can efficiently cover the entire filesystem tree. -## Design Goals +## Core Design -The OCI sealing specification aims to provide efficient verification where a signature on an OCI manifest cryptographically covers the entire filesystem tree without re-hashing content. The specification defines standardized metadata locations for composefs digests and supports future format evolution without breaking existing images. +"composefs digest" here means the fsverity digest of the EROFS metadata file. fsverity is configurable based on digest algorithm (SHA-256 or SHA-512 currently) and block size (4k or 64k). -Incremental verification must be supported, enabling verification of individual layers or the complete flattened filesystem. The design accommodates both registry-provided sealed images and client-side sealing workflows while maintaining backward compatibility with existing OCI tooling and registries. +For standardized short form of the combination, a string of the form `fsverity-${DIGEST}-${BLOCKSIZEBITS}` is used. The `fsverity-` prefix makes clear this is an fsverity Merkle tree digest, not a simple hash: -## Core Design +- `fsverity-sha256-12` (SHA-256, 4k block size, 2^12) +- `fsverity-sha512-12` (SHA-512, 4k block size) +- `fsverity-sha256-16` (SHA-256, 64k block size, 2^16) +- `fsverity-sha512-16` (SHA-512, 64k block size) -### Composefs Digest Storage +Digests are encoded as lowercase hexadecimal. -The composefs fsverity digest is stored as a label in the OCI image config: +### EROFS Provisioning Modes -```json -{ - "config": { - "Labels": { - "containers.composefs.fsverity": "sha256:a3b2c1d4e5f6..." - } - } -} +There are two modes for how the EROFS metadata image is obtained by a client. erofs-alongside is the primary mode and the focus of this specification. canonical-EROFS is a future evolution that builds on it. + +#### EROFS-alongside mode (primary) + +In this mode, the EROFS metadata image is built server-side as part of a composefs OCI artifact which is also stored on the registry. It's important to emphasize that this process can happen independent of the image build; it operates similarly to a signature. Clients unaware of composefs work as before. + +This is the primary mode because: + +- It works today without cross-implementation EROFS standardization — the exact EROFS bytes are authored by the image publisher, so there is no need for multiple implementations to agree on a bit-for-bit identical layout. +- EROFS is a natural metadata format for incremental pulls and content-addressed object stores (see [Incremental Pulls](#incremental-pulls-via-erofs-alongside) in Future Directions). Any incremental fetch mechanism needs a separate metadata format, and EROFS — natively supported by the Linux kernel with multiple userspace parsers — is a strong fit. +- The EROFS here is just metadata; the tar layer is still required for content. + +See [Composefs Artifact Structure](#composefs-artifact-structure) below for more information about the layout. + +To prevent the "representational ambiguity" problem — what happens when the tar layer and the prebuilt EROFS disagree — the client MUST verify consistency: + +1. Fetch the composefs artifact and verify that it has a 1-to-1 correspondence with the source image manifest: each layer in the manifest must have exactly one matching EROFS metadata entry in the artifact (identified by position). A mismatch in count is a fatal error. +2. For each layer, verify the metadata correspondence between the tar layer and the EROFS: + - Parse the tarball to extract a filesystem tree representation (file paths, modes, ownership, xattrs, and fsverity content digests) + - Walk the corresponding EROFS metadata to extract the same representation + - Compare the two — they must agree on all filesystem metadata and content references. Any disagreement is a fatal error. + +This consistency check operates at the semantic filesystem level, not at the EROFS byte level. It does not require a canonical EROFS specification, but it does require agreement on how tar entries map to filesystem metadata (see [doc/oci.md](../oci.md) for OCI-to-composefs conversion decisions). + +**Security consideration: parsing untrusted EROFS.** In this mode, the EROFS image is data fetched from a registry. When fsverity signatures are present, the EROFS signature is verified before mount — trust in the EROFS is trust in the publisher, the same as any signed artifact. However, the userspace consistency check (step 2 above) still parses the EROFS before signature verification, and in the unsigned/digest-only case, the EROFS is entirely attacker-controlled at parse time. This is an attack surface distinction from canonical-EROFS mode, where the EROFS is locally generated from trusted inputs. + +To mitigate this, EROFS parsing code — both userspace and in-kernel — should be written in memory-safe languages or otherwise hardened. The composefs-rs userspace parser is written in Rust. The Linux kernel's EROFS implementation is fuzz-tested via syzbot and has been hardened over multiple release cycles. Implementations SHOULD validate EROFS structural integrity (superblock magic, bounds checks, inode consistency) before performing the semantic consistency check or mounting. + +#### Canonical-EROFS mode (future) + +This mode is not yet usable — it is blocked on the EROFS standardization work described in [standardized-erofs-meta.md](standardized-erofs-meta.md). + +In this mode, no EROFS metadata is shipped on the wire. The client and server generate the EROFS using a standardized canonical process: + +``` +tar layer → dumpfile → EROFS metadata ``` -The config represents the container's identity rather than transport metadata. Manifests are transport artifacts that can vary across different distribution mechanisms. Adding the composefs label creates a new config and thus a new manifest, establishing the sealed image as a distinct artifact. This means sealing an image produces a new image with a different config digest, where the original unsealed image and sealed image coexist as separate artifacts that registries treat as distinct versions. +This requires a finalized canonical EROFS specification that guarantees byte-for-byte identical output across implementations given identical input. Without this guarantee, fsverity digests computed by different implementations would not match, and signatures would fail to verify. -### Digest Type +In this mode, the composefs digest annotations on the image manifest (or in the composefs artifact) serve as the sole reference. The client generates the EROFS, computes its fsverity digest, and verifies it matches the annotation. No EROFS bytes need to be stored on the registry. -The primary digest is the fs-verity digest of the EROFS image containing the merged, flattened filesystem. This digest provides fast verification at mount time through kernel fs-verity checks and is deterministic: the same input layers always produce the same EROFS digest. The digest covers the complete filesystem tree including all metadata such as permissions, timestamps, and extended attributes. +Canonical-EROFS is best understood as a future tightening of erofs-alongside: once a canonical EROFS specification is defined, erofs-alongside artifacts could be required to use the canonical layout. This would allow clients to verify the EROFS against the tar layer by regenerating it locally, without needing to parse the shipped EROFS at all. In effect, the shipped EROFS would become a cache of a deterministic computation. -### Merged Filesystem Representation +#### Digest-only mode (future, requires canonical-EROFS) -The config label contains the digest of the merged, flattened filesystem. This represents the final filesystem state after extracting all layers in order, applying whiteouts (`.wh.` files), merging directories where the most-derived layer wins for metadata, and building the final composefs EROFS image. +Once canonical-EROFS is available, a further simplification becomes possible: **no composefs artifact at all**. The composefs digest is placed directly on the image manifest layer annotations (see [Composefs Digest Storage](#composefs-digest-storage)), and the client generates the canonical EROFS locally, verifying its fsverity digest against the annotation. -### Per-Layer Digests (Future Extension) +This is the cleanest end state — the OCI image carries only standard tar layers with a composefs digest annotation, and composefs is purely a client-side optimization. No separate artifact, no EROFS on the wire, no signatures beyond whatever already covers the manifest (cosign, sigstore, etc.). -Per-layer composefs digests may be added as manifest annotations: +This mode is a natural consequence of canonical-EROFS and does not require additional specification beyond what is already defined for manifest annotations and canonical EROFS generation. + +### Recommended default algorithm + +The suggested default is `fsverity-sha512-12` - this maximizes compatibility as +not every system can support higher page sizes, and also maximizes security (there are +post-quantum crypto arguments against SHA-256). + +### Composefs Digest Storage + +Composefs digests — the fsverity digests of EROFS metadata images — can be stored as annotations. This is most relevant in canonical-EROFS mode, where the digest is the primary mechanism for verifying a locally-generated EROFS. In erofs-alongside mode, the EROFS metadata itself is shipped in the composefs artifact and the digest can be computed from it directly, so annotations serve mainly as a convenience for discovery. + +Digests can appear in two locations: + +1. **Composefs artifact** (primary): As annotations on the composefs artifact layers. This is the recommended approach because it allows signing existing unmodified OCI images — the original manifest is never touched. + +2. **Manifest annotations** (optional): As annotations on the image manifest layers. This is a convenience for tools that want to verify composefs digests without fetching a separate artifact. When both are present, they MUST agree. + +When using manifest annotations, in [the manifest](https://github.com/opencontainers/image-spec/blob/main/manifest.md), +each layer may have an annotation with a composefs digest. + +```json +{ + "layers": [ + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0", + "size": 32654, + "annotations": { + "composefs.layer.fsverity-sha512-12": "3abb6677af34ac57c0ca5828fd94f9d886c26ce59a8ce60ecf6778079423dccff1d6f19cb655805d56098e6d38a1a710dee59523eed7511e5a9e4b8ccb3a4686" + } + }, + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b", + "size": 16724, + "annotations": { + "composefs.layer.fsverity-sha512-12": "63e22ec2fbeebabf005e58fbfb0eee607c4aa417045a68a0cc63767b048e3559268d35e72f367d3b2dbd5dbddf12fc4397762ba149260b3795a0391713bddcd7" + } + }, + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:ec4b8955958665577945c89419d1af06b5f7636b4ac3da7f12184802ad867736", + "size": 73109, + "annotations": { + "composefs.layer.fsverity-sha512-12": "2b59d179d9815994f687383a886ea34109889756efca5ab27318cc67ce2a21261d12fa6fee6b8c716f72214ead55ee0d789d6c35cff977d40ef5728ba9188a80" + } + } + ] +} +``` + +Additionally, an optional merged digest may be provided on the **final layer only**, representing the *flattened* merged filesystem tree of the complete stack of all layers. The rationale is that it makes it easier for a runtime to avoid the overhead of individual mounts if it chooses to do so. This is especially suitable for e.g. a "base image" whose stack of mounts would commonly be shared with higher level applications. ```json { - "manifests": [ - { - "layers": [ - { - "digest": "sha256:...", - "annotations": { - "containers.composefs.layer.fsverity": "sha256:..." - } - } - ] + "layers": [ + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0", + "size": 32654, + "annotations": { + "composefs.layer.fsverity-sha512-12": "3abb6677af34ac57c0ca5828fd94f9d886c26ce59a8ce60ecf6778079423dccff1d6f19cb655805d56098e6d38a1a710dee59523eed7511e5a9e4b8ccb3a4686" + } + }, + { + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b", + "size": 16724, + "annotations": { + "composefs.layer.fsverity-sha512-12": "63e22ec2fbeebabf005e58fbfb0eee607c4aa417045a68a0cc63767b048e3559268d35e72f367d3b2dbd5dbddf12fc4397762ba149260b3795a0391713bddcd7", + "composefs.merged.fsverity-sha512-12": "d015f70f8bee6cf6453dd5b771eec18994b861c646cec18e2a9dfdec93f631fbb9030e60cfc82b552d33b9a134312a876ef4e519bffe3ef872aefbd84e6198b3" + } } ] } ``` -Per-layer digests enable incremental verification during pull, create caching opportunities where shared layers have known composefs digests, and enable runtime choice between flattened versus layered mounting strategies. +Note: The `composefs.merged.fsverity-sha512-12` annotation appears only on the final layer and represents the complete flattened filesystem of all layers merged together. + +#### Whiteout Handling in Merged Filesystem + +The merged EROFS represents a fully flattened filesystem and is designed to be mounted directly, not stacked with other EROFS layers via overlayfs. During the merge process, OCI whiteouts (`.wh.*` files and opaque directory markers) are fully processed: files and directories marked for deletion in upper layers are removed from the merged result. The final merged EROFS contains no whiteout entries — it is a clean, whiteout-free snapshot of the complete filesystem tree as it would appear after all layers are applied. -### Trust Chain +### Signatures -The trust chain for composefs-verified OCI images flows from external signatures through the manifest to the complete filesystem: +#### Linux kernel fsverity signatures (recommended) + +The primary signature mechanism is Linux kernel [fsverity built-in signature verification](https://docs.kernel.org/filesystems/fsverity.html#built-in-signature-verification). The kernel's `FS_IOC_ENABLE_VERITY` ioctl accepts a PKCS#7 signature that is verified against the `.fs-verity` keyring. This provides a clear chain of trust: the same component that controls data access (the kernel) also validates the signature. The kernel additionally integrates with the [IPE](https://docs.kernel.org/admin-guide/LSM/ipe.html) (Integrity Policy Enforcement) subsystem. + +The recommended delivery mechanism for these signatures is a separate OCI artifact using the Referrer pattern, described below. This enables signing existing unmodified OCI images. + +Signatures MAY also be embedded as manifest annotations using a `.signature` suffix on digest annotations (e.g. `composefs.layer.fsverity-sha512-12.signature` with base64-encoded PKCS#7), though this requires modifying the image manifest. + +#### Digest-only verification (alternative) + +Kernel-based signing is not required. An implementation may instead rely on external trust in the composefs digests themselves — for example, by trusting the OCI manifest (verified via cosign/sigstore/GPG) and treating the composefs digest annotations as authoritative. In this model: ``` External signature (cosign/sigstore/GPG) ↓ signs -OCI Manifest (includes config descriptor) - ↓ digest reference -OCI Config (includes containers.composefs.fsverity label) - ↓ fsverity digest -Composefs EROFS image - ↓ contains -Complete merged filesystem tree +OCI Manifest (includes composefs digest annotations) + ↓ +Composefs EROFS image (verified against digest) + ↓ +Complete filesystem tree ``` -## Verification Process +The userspace tooling performing this verification must be trusted. A key benefit of composefs is that verification of large data is on-demand and continuous via the kernel's fsverity — the composefs digest covers the complete filesystem tree, so verifying it is cheap even though the underlying data may be large. -Verification begins by fetching the manifest from the registry and verifying the external signature on the manifest. The config descriptor is extracted from the manifest, and the config is fetched and verified to match the descriptor digest. The `containers.composefs.fsverity` label is extracted from the config, and the composefs image is mounted with fsverity verification. The kernel verifies the EROFS matches the expected fsverity digest. +#### Replacing diff_id validation -The security property is that signature verification happens once, while filesystem verification is delegated to kernel fs-verity with lazy or eager verification depending on mount options. +The OCI image specification requires a `diff_id` in the [image config](https://github.com/opencontainers/image-spec/blob/main/config.md) for each layer, which is the digest of the uncompressed tar stream. This is expensive to validate after extraction and provides no path to continual kernel-enforced verification. With composefs, validating `diff_id` becomes redundant: the composefs digest already cryptographically covers the complete filesystem tree derived from the layer. -## Metadata Schema +#### Composefs Artifact Structure -### Config Labels +Composefs data — signatures and optionally prebuilt EROFS metadata (erofs-alongside mode) — is stored as a separate OCI artifact, discoverable via the OCI referrer pattern. This follows the same approach as cosign: the composefs artifact references the sealed image through the `subject` field and can be found via the `/referrers` API. -The image config contains the following labels: +Signature layers are raw PKCS#7 DER-encoded blobs — exactly the format expected by `FS_IOC_ENABLE_VERITY`. No JSON wrapping or base64 encoding. Prebuilt EROFS layers (when present) are raw EROFS images. -The `containers.composefs.fsverity` label (string) contains the fsverity digest of the merged composefs EROFS in the format `:` where algorithm is `sha256` or `sha512`. +##### Artifact Manifest -The `containers.composefs.version` label (string, optional) contains the seal format version such as `1.0`. +The composefs artifact is an OCI image manifest following the [artifacts guidance](https://github.com/opencontainers/image-spec/blob/main/artifacts-guidance.md) pattern (empty config, content in layers): -### Descriptor Annotations +The provisioning mode is indicated by the `artifactType`: -A descriptor may have the following annotation: +- `application/vnd.composefs.erofs-alongside.v1` — the artifact contains prebuilt EROFS metadata layers alongside optional signatures +- `application/vnd.composefs.canonical.v1` *(future)* — the artifact contains only signatures; the client generates the EROFS locally -The `containers.composefs.layer.fsverity` annotation (string, optional) contains the fsverity digest of that individual layer. +This allows clients to discover which mode is available via the referrers API filtered by `artifactType`. -### Label versus Annotation Semantics +**EROFS-alongside example** (prebuilt EROFS on registry): -Config labels store the authoritative digest because the config represents container identity while the manifest is a transport artifact. Labels are part of the container specification and create a new artifact (sealed image) rather than mutating metadata. Manifest annotations are retained for discovery purposes, allowing registries to identify sealed images without parsing configs and enabling clients to optimize pull strategies. +```json +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "artifactType": "application/vnd.composefs.erofs-alongside.v1", + "config": { + "mediaType": "application/vnd.oci.empty.v1+json", + "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a", + "size": 2 + }, + "layers": [ + { + "mediaType": "application/vnd.composefs.v1.erofs", + "digest": "sha256:fff...", + "size": 8192, + "annotations": { + "composefs.erofs.type": "layer", + "composefs.digest": "3abb6677af34ac57...layer-1-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.v1.erofs", + "digest": "sha256:ggg...", + "size": 4096, + "annotations": { + "composefs.erofs.type": "layer", + "composefs.digest": "63e22ec2fbeeba...layer-2-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.v1.erofs", + "digest": "sha256:hhh...", + "size": 12288, + "annotations": { + "composefs.erofs.type": "merged", + "composefs.digest": "d015f70f8bee6c...merged-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:aaa...", + "size": 456, + "annotations": { + "composefs.signature.type": "manifest", + "composefs.digest": "ab12...manifest-fsverity-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:bbb...", + "size": 789, + "annotations": { + "composefs.signature.type": "config", + "composefs.digest": "cd34...config-fsverity-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:ccc...", + "size": 1234, + "annotations": { + "composefs.signature.type": "layer", + "composefs.digest": "3abb6677af34ac57...layer-1-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:ddd...", + "size": 1234, + "annotations": { + "composefs.signature.type": "layer", + "composefs.digest": "63e22ec2fbeeba...layer-2-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:eee...", + "size": 1234, + "annotations": { + "composefs.signature.type": "merged", + "composefs.digest": "d015f70f8bee6c...merged-composefs-digest..." + } + } + ], + "subject": { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", + "size": 7682 + }, + "annotations": { + "composefs.algorithm": "fsverity-sha512-12" + } +} +``` + +**Canonical-EROFS example** *(future — not yet usable)*: + +```json +{ + "schemaVersion": 2, + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "artifactType": "application/vnd.composefs.canonical.v1", + "config": { + "mediaType": "application/vnd.oci.empty.v1+json", + "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a", + "size": 2 + }, + "layers": [ + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:aaa...", + "size": 456, + "annotations": { + "composefs.signature.type": "manifest", + "composefs.digest": "ab12...manifest-fsverity-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:bbb...", + "size": 789, + "annotations": { + "composefs.signature.type": "config", + "composefs.digest": "cd34...config-fsverity-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:ccc...", + "size": 1234, + "annotations": { + "composefs.signature.type": "layer", + "composefs.digest": "3abb6677af34ac57...layer-1-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:ddd...", + "size": 1234, + "annotations": { + "composefs.signature.type": "layer", + "composefs.digest": "63e22ec2fbeeba...layer-2-composefs-digest..." + } + }, + { + "mediaType": "application/vnd.composefs.signature.v1+pkcs7", + "digest": "sha256:eee...", + "size": 1234, + "annotations": { + "composefs.signature.type": "merged", + "composefs.digest": "d015f70f8bee6c...merged-composefs-digest..." + } + } + ], + "subject": { + "mediaType": "application/vnd.oci.image.manifest.v1+json", + "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270", + "size": 7682 + }, + "annotations": { + "composefs.algorithm": "fsverity-sha512-12" + } +} +``` + +##### Layer Ordering -## Verification Modes +Each layer carries annotations that identify its role. Signature layers use `composefs.signature.type`; EROFS metadata layers (erofs-alongside mode only) use `composefs.erofs.type`. Both carry `composefs.digest` with the fsverity digest. This makes the artifact self-contained — a consumer can verify composefs digests using only the composefs artifact and the image layers, without requiring composefs annotations on the original image manifest. -### Eager Verification +The layers MUST appear in this order: -Eager verification occurs during image pull. The composefs image is immediately created and its digest is verified against the config label. This makes the container ready to mount immediately after pull and is suitable for boot scenarios where operations should be read-only. +1. **(erofs-alongside only)** N EROFS metadata entries with `composefs.erofs.type: "layer"` — one per manifest layer, in manifest order. Each is a raw EROFS metadata image. +2. **(erofs-alongside only)** Zero or one EROFS metadata entry with `composefs.erofs.type: "merged"` — the flattened merged EROFS for the complete image. +3. **(Optional)** One signature with `composefs.signature.type: "manifest"` — signature for the sealed image manifest, stored as a file with fsverity +4. **(Optional)** One signature with `composefs.signature.type: "config"` — signature for the image config, stored as a file with fsverity +5. N signature entries with `composefs.signature.type: "layer"` — one per manifest layer, in manifest order. Each signature is applied to the EROFS blob via `FS_IOC_ENABLE_VERITY`. +6. Zero or one signature with `composefs.signature.type: "merged"` — if present, this is the signature for the merged EROFS representing the complete flattened filesystem. -### Lazy Verification +Position within each group determines which source object the entry corresponds to. The number of `layer`-type entries (both EROFS and signature) MUST equal the number of layers in the source manifest. When an erofs-alongside EROFS layer and its corresponding signature layer both carry `composefs.digest`, they MUST agree. -Lazy verification defers composefs creation until first mount. The pull operation stores layers and config but doesn't build the composefs image. On mount, the composefs image is built and verified against the label. This mode is suitable for application containers where many images may be pulled but only some are actually used. +This design enables signing existing unmodified OCI images: compute composefs digests for each layer, sign them, and push the composefs artifact as a referrer. The original image is never touched. -## Security Model +##### Signature Format -### Registry-Provided Sealed Images +Each layer blob is a raw PKCS#7 signature encoded using [DER](https://en.wikipedia.org/wiki/X.690#DER_encoding) (Distinguished Encoding Rules, ITU-T X.690) over the kernel's `fsverity_formatted_digest`: -For images sealed by the registry or vendor, the seal is computed during the build process and the seal label is embedded in the published config. An external signature covers the manifest. Clients verify the chain: signature → manifest → config → composefs. Trust is placed in the image producer and the signature key. +```c +struct fsverity_formatted_digest { + char magic[8]; /* "FSVerity" */ + __le16 digest_algorithm; + __le16 digest_size; + __u8 digest[]; +}; +``` + +Composefs algorithm identifiers map to kernel constants with no salt: +- `fsverity-sha512-12` → `FS_VERITY_HASH_ALG_SHA512`, 4096-byte blocks +- `fsverity-sha256-12` → `FS_VERITY_HASH_ALG_SHA256`, 4096-byte blocks +- `fsverity-sha512-16` → `FS_VERITY_HASH_ALG_SHA512`, 65536-byte blocks +- `fsverity-sha256-16` → `FS_VERITY_HASH_ALG_SHA256`, 65536-byte blocks + +All entries in a single composefs artifact MUST use the same algorithm. The algorithm is declared in the `composefs.algorithm` annotation on the composefs artifact manifest (e.g. `fsverity-sha512-12`). + +For manifest and config signatures, the fsverity digest is computed over the exact JSON bytes as stored in the registry. These files are stored locally with fsverity enabled so that reads are kernel-verified. + +##### Discovery and Verification + +Discovery uses the standard [OCI Distribution Spec referrers API](https://github.com/opencontainers/distribution-spec/blob/main/spec.md#listing-referrers): +``` +GET /v2//referrers/?artifactType=application/vnd.composefs.erofs-alongside.v1 +GET /v2//referrers/?artifactType=application/vnd.composefs.canonical.v1 +``` + +Verification depends on the mode: + +**EROFS-alongside** (`artifactType: application/vnd.composefs.erofs-alongside.v1`): +1. Check `subject` matches the sealed image manifest digest +2. Extract EROFS metadata layers from the artifact +3. Fetch and unpack each tar layer; generate a canonical in-memory metadata representation (e.g. composefs dumpfile) from the tar and compare against the EROFS metadata — disagreement is fatal +4. The EROFS metadata is used directly (no local generation needed) +5. If signature layers are present, apply them via `FS_IOC_ENABLE_VERITY` to the EROFS files +6. If the source manifest has composefs digest annotations, verify they match the artifact's `composefs.digest` values + +**Canonical-EROFS** *(future)* (`artifactType: application/vnd.composefs.canonical.v1`): +1. Check `subject` matches the sealed image manifest digest +2. Read `composefs.digest` annotations from signature layers (or from the source manifest annotations) to learn the expected fsverity digests +3. Generate the EROFS locally from the tar layers using the canonical process +4. Compute the fsverity digest of the locally generated EROFS and verify it matches the expected digest +5. If signature layers are present, apply them via `FS_IOC_ENABLE_VERITY` to the EROFS files + +In both modes, the kernel handles PKCS#7 validation when signatures are used — failed verification prevents reading the file. + +``` +External CA/Keystore + ↓ issues certificate for .fs-verity keyring +PKCS#7 signatures (from artifact layers) + ↓ applied via FS_IOC_ENABLE_VERITY to each file +Manifest JSON, Config JSON, EROFS layer blobs + ↓ kernel fsverity enforcement on every read +Runtime file access +``` + +##### Implementation Considerations -### Client-Sealed Images +Kernel-level signature verification depends on Linux kernel fsverity (CONFIG_FS_VERITY, CONFIG_FS_VERITY_BUILTIN_SIGNATURES). Signature validation and file access enforcement are handled by the Linux kernel. -For images sealed locally by the client, the client pulls an image that may be unsigned and computes the seal locally. The client stores the sealed config in its local repository. On boot or mount, the client can re-fetch the manifest from the network to verify freshness. Trust is placed in the network fetch (TLS) and local verification. +When signatures are present, the manifest and config signature entries MUST also be present — there is no reason to sign individual layers without also signing the manifest and config that reference them. The merged entry remains optional. -## Attack Mitigation +In erofs-alongside mode, the EROFS `layer` group MUST always be present (that is the primary purpose of the artifact). Signature layers are optional — an erofs-alongside artifact without signatures is valid and supports digest-only verification. This is the expected common case: a composefs artifact is attached to an existing image to provide EROFS metadata, without requiring the publisher to have signing keys. -### Digest Mismatch +In canonical-EROFS mode, the composefs artifact exists only to carry signatures (the EROFS is generated locally). If an implementation uses digest-only verification, it does not need a composefs artifact at all — the `composefs.layer.*` annotations on the image manifest are sufficient. -If a config label doesn't match the actual EROFS, the mount operation fails the fsverity check. Verification APIs can detect this condition before mounting. +Clients that pull images with composefs artifacts are expected to also store the artifact locally alongside the image (it's just a small amount of metadata), and to attach the signatures to the corresponding files at the Linux kernel level. This enables offline verification and allows fsverity signatures to be applied when files are later accessed. However, local storage of the artifact is not strictly required — a client could re-fetch the artifact from the registry when needed, or operate in digest-only mode where the composefs digests themselves are trusted without kernel signature verification. -### Signature Bypass +Implementations should focus on erofs-alongside mode, which works today. Once the canonical EROFS specification is finalized, implementations SHOULD support both modes. -Any attempt to modify the config label without updating the signature fails because the signature covers the manifest, which covers the config digest. Any config change produces a new digest, breaking the signature chain. +##### Media Types -### Rollback Attack +- `application/vnd.composefs.erofs-alongside.v1`: Artifact type for erofs-alongside composefs artifacts (EROFS metadata + optional signatures) +- `application/vnd.composefs.canonical.v1`: Artifact type for canonical-EROFS composefs artifacts (signatures only) +- `application/vnd.composefs.v1.erofs`: Layer media type for prebuilt EROFS metadata images (erofs-alongside only) +- `application/vnd.composefs.signature.v1+pkcs7`: Layer media type for PKCS#7 DER signature blobs -For application containers, re-fetching the manifest on boot checks for freshness. For host systems, embedding the manifest in the boot artifact prevents rollback. +## Storage model -### Layer Confusion +It is recommended to store the config, manifest and unpacked layers. -Per-layer fsverity annotations allow verification before merging. Implementations that maintain digest maps can link layer SHA256 digests to fsverity digests. +In erofs-alongside mode, the prebuilt EROFS is fetched from the registry and stored directly. In canonical-EROFS mode, the EROFS is generated locally on-demand or cached (indexed by manifest digest). In either case, the composefs artifact itself should be stored locally to enable offline signature verification. ## Relationship to Booting with composefs OCI sealing is independent from but complementary to composefs boot verification (UKI, BLS, etc.). These are separate mechanisms operating at different stages of the system lifecycle with different trust models. -OCI sealing provides runtime verification of container images distributed through registries. The trust chain typically flows from external signatures (cosign, GPG) through OCI manifests to composefs digests. +It is expected that boot-sealed images would *also* be OCI sealed, although this is not strictly required. -Boot verification is designed to be rooted in extant hardware mechanisms such as Secure Boot. The composefs digest is embedded directly in boot artifacts (UKI `.cmdline` section, BLS entry `options` field) and verified during early boot by the initramfs. +### Bootable composefs UKI and kernel command line -These mechanisms work together in a complete workflow where a sealed OCI image can be pulled from a registry, verified through OCI sealing, and then used to build a boot artifact with the composefs digest embedded for boot verification. However, each mechanism operates independently with its own trust anchor and threat model. +The default model implemented is that the UKI's kernel command line includes the fsverity digest of a slightly modified EROFS (without `/boot` among other things). This currently relies on canonical-EROFS mode since the digest must match between what the UKI embeds at build time and what the client generates at boot time. + +With erofs-alongside mode, it would also be possible to instead load signing keys into the kernel fsverity chain from the initramfs (which may be the same or different keys used for application images), and use the composefs artifact signature scheme for mounting the root filesystem from the initramfs. This would remove the dependency on canonical EROFS generation for boot. ## Future Directions +### Incremental Pulls via EROFS-alongside + +In erofs-alongside mode, the EROFS metadata contains fsverity digests of all content objects, so the client can determine which objects it already has locally and only fetch the missing ones from the tar layer. The EROFS effectively acts as a table of contents — a metadata format that is natively supported by the Linux kernel and has multiple userspace parsers. + +To support this, a small amount of additional metadata would be needed: for each non-inline file, an offset mapping (`{ erofs_inode, layer_offset }`) so the client can locate objects within the (compressed) tar layer without downloading the whole thing. This would require the layers to use a seekable compression format (e.g. zstd:chunked, estargz, or future [registry-level compression](https://github.com/opencontainers/distribution-spec/issues/235)). + ### Dumpfile Digest as Canonical Identifier -The fsverity digest ties implementations to a specific EROFS format. A dumpfile digest (SHA256 of the composefs dumpfile format) would enable format evolution. This would be stored as an additional label `containers.composefs.dumpfile.sha256` alongside the fsverity digest. +The fsverity digest ties implementations to a specific EROFS format; for more details on this, see [this issue](https://github.com/composefs/composefs/issues/198). A dumpfile digest (classic SHA or fsverity digest) of the composefs dumpfile format would enable format evolution. + +This would also be stored as an annotation: -The dumpfile format is format-agnostic, meaning the same dumpfile can generate different EROFS versions. This simplifies standardization since the dumpfile format is simpler than EROFS and provides future-proofing to migrate to composefs-over-squashfs or other formats. +```json +{ + "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip", + "digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0", + "size": 32654, + "annotations": { + "composefs.layer.fsverity-sha512-12": "3abb6677af34ac57c0ca5828fd94f9d886c26ce59a8ce60ecf6778079423dccff1d6f19cb655805d56098e6d38a1a710dee59523eed7511e5a9e4b8ccb3a4686", + "composefs.layer.fsverity-sha512-12.signature": "MIIBkgYJKo...base64-encoded-pkcs7...", + "composefs.dumpfile.sha512": "62d4b68bc4d336ff0982b93832d9a1f1d40206b49218299e5ac2e50f683d23f17bb99a1f3805339232abebd702eeda204827cfde244bf833e42b67a2fe632dc0" + } +} +``` -The challenge is that verification becomes slower as it requires parsing a saved EROFS from disk to dumpfile format. Caching the dumpfile digest to fsverity digest mapping introduces complexity and security implications. A use case split might apply dumpfile digests to application containers (for format flexibility) while using fsverity digests for host boot (for speed with minimal skew). +A downside though is that because the mapping from the tar layer to the EROFS was not pre-computed server side, there is no way to attach a kernel-native signature. However, it does still allow efficient validation of the complete filesystem tree, given only the saved metadata (e.g. tar-split or splitstream) in combination with the fsverity digests of content. ### Integration with zstd:chunked @@ -173,10 +530,6 @@ Both zstd:chunked and composefs add new digests to OCI images. The zstd:chunked Adding fsverity to zstd:chunked TOC entries would allow using the TOC digest as a canonical composefs identifier. This would support a direct TOC → dumpfile → composefs pipeline, with a single metadata format serving both zstd:chunked and composefs use cases. -### Three-Digest Model - -To support both flattened and layered mounting strategies, three digests could be stored per image: a base image digest, a derived layers digest, and a flattened digest. This would enable mounting a single flattened composefs for speed, mounting base and derived separately to avoid metadata amplification, or verifying the base from upstream while only rebuilding derived layers. This aligns with the existing `org.opencontainers.image.base.digest` standard. - ## References **Design discussion**: [composefs/composefs#294](https://github.com/composefs/composefs/issues/294) @@ -192,8 +545,7 @@ To support both flattened and layered mounting strategies, three digests could b **Standards**: - [OCI Image Specification](https://github.com/opencontainers/image-spec) -- [Canonical JSON](https://wiki.laptop.org/go/Canonical_JSON) ## Contributors -This specification synthesizes ideas from Colin Walters (original design proposals and iteration), Allison Karlitskaya (implementation and practical refinements), and Alexander Larsson (security model and non-root mounting insights). Significant assistance from Claude Sonnet 4.5 was used in synthesis. +This specification synthesizes ideas from Colin Walters (original design proposals and iteration), Allison Karlitskaya (implementation and practical refinements), Alexander Larsson (security model and non-root mounting insights), and Giuseppe Scrivano (across the board) with assistance from Claude Sonnet 4.5 and Claude Opus 4. diff --git a/doc/plans/standardized-erofs-meta.md b/doc/plans/standardized-erofs-meta.md new file mode 100644 index 00000000..0413a01a --- /dev/null +++ b/doc/plans/standardized-erofs-meta.md @@ -0,0 +1,83 @@ +# Standardized EROFS Metadata Serialization + +This document outlines the goal of standardizing how composefs serializes filesystem trees to EROFS metadata images. + +## Relationship to OCI Sealing Modes + +The [OCI sealing specification](oci-sealing-spec.md) defines two EROFS provisioning modes. This standardization work is specifically required for **canonical-EROFS mode**, where the client generates the EROFS locally and must produce a byte-identical result to what the server (or any other implementation) would generate. + +**EROFS-alongside mode** does not require this standardization because the publisher ships the exact EROFS bytes to clients. EROFS-alongside can be used today without solving the problems described here. + +However, even in erofs-alongside mode, a canonical dumpfile representation is valuable for the consistency check between the tar layer and the prebuilt EROFS (see erofs-alongside verification in the OCI sealing spec). + +## Goal + +Standardize how a filesystem tree, expressed canonically as a composefs dumpfile (or equivalent representation), is serialized to EROFS metadata. This enables reproducible EROFS generation across implementations and is a prerequisite for canonical-EROFS mode in the OCI sealing specification. + +## Conceptual Model + +The canonical transformation model is: + +``` +tar layer → dumpfile → EROFS metadata +``` + +Even when implementations optimize by going directly from tar to EROFS for efficiency, the canonical model remains tar → dumpfile → EROFS. This means: + +1. Two implementations processing the same tar layer should produce equivalent dumpfiles +2. Two implementations processing the same dumpfile MUST produce byte-identical EROFS images +3. Therefore, two implementations processing the same tar layer should produce byte-identical EROFS images + +The dumpfile serves as the canonical intermediate representation that defines the filesystem tree independent of serialization format. + +## Why This Matters + +- **Canonical-EROFS OCI sealing**: Canonical-EROFS mode in the OCI sealing specification depends entirely on this standardization. Without it, fsverity digests computed by different implementations would not match, and signatures would fail to verify. +- **Reproducible EROFS generation**: Given identical inputs, composefs-c, composefs-rs, and any future implementations must produce byte-for-byte identical EROFS images +- **Ecosystem compatibility**: Container runtimes, build tools, and registries can use different implementations interchangeably +- **UKI boot**: The sealed UKI boot model embeds a composefs digest in the kernel command line, which must match the EROFS generated at boot time — this is inherently a canonical-EROFS use case + +Note: EROFS-alongside mode provides an alternative path that avoids these requirements, at the cost of shipping EROFS metadata on the registry. See [oci-sealing-spec.md](oci-sealing-spec.md) for a comparison. + +## Current State + +This standardization is a work in progress: + +- **[composefs/composefs#423](https://github.com/composefs/composefs/discussions/423)**: Discussion on compatible EROFS output across implementations +- **[composefs-rs PR #225](https://github.com/composefs/composefs-rs/pull/225)**: Initial reimplementation of composefs-c in Rust, with compatible EROFS output as a key goal + +## Open Questions + +The following details need to be standardized (future work): + +### EROFS Format Options +- EROFS format version and feature flags +- Block size (currently 4096) +- Compression settings (composefs uses uncompressed metadata) + +### Inode Representation +- Compact vs extended inode format +- Inode numbering scheme +- Handling of hardlinks (inode sharing) + +### Metadata Ordering +- Inode table ordering (depth-first? breadth-first? by path?) +- Directory entry ordering within directories +- Xattr key ordering within an inode +- Shared xattr table construction algorithm + +### Content Handling +- Inline data threshold (currently ~64 bytes for external, but exact cutoff matters) +- External file references via overlay metacopy xattrs +- Symlink target storage + +### OCI-Specific Concerns +- Whiteout representation (should not appear in final EROFS — processed during merge) +- Root inode metadata normalization (copying from `/usr`) +- Timestamp precision (seconds only, matching tar limitations) + +## References + +- [Splitstream binary format](../splitstream.md) — related binary format for storing tar data +- [OCI sealing specification](oci-sealing-spec.md) — depends on reproducible EROFS generation +- [EROFS documentation](https://docs.kernel.org/filesystems/erofs.html) — kernel filesystem documentation