Skip to content

dumpfile: Use named escapes and only escape '=' in xattr fields#271

Open
cgwalters wants to merge 3 commits intocomposefs:mainfrom
cgwalters:prep-dumpfile-named-escapes
Open

dumpfile: Use named escapes and only escape '=' in xattr fields#271
cgwalters wants to merge 3 commits intocomposefs:mainfrom
cgwalters:prep-dumpfile-named-escapes

Conversation

@cgwalters
Copy link
Collaborator

Increase alignment for dumpfile generation with the composefs C implementation - on general principle but also motivated by the goal of reimplementing it in Rust here.

The C composefs implementation uses named escapes for backslash, newline, carriage return, and tab (\ \n \r \t), while our writer was hex-escaping them uniformly (\x5c \x0a etc). Both forms parse correctly, but byte-identical output matters for cross-implementation comparison.

Similarly, C only escapes '=' in xattr key/value fields (where it separates key from value). We were escaping it as \x3d in all fields including paths and content, where '=' is a normal graphic character.

Assisted-by: OpenCode (Claude Opus 4)

@cgwalters cgwalters force-pushed the prep-dumpfile-named-escapes branch 2 times, most recently from f2a0b27 to a3bb7c2 Compare March 17, 2026 22:39
The composefs-dump(5) spec leaves several fields unspecified or
explicitly ignored. Canonicalize them at parse time so that parsed
entries have a single canonical representation regardless of which
implementation produced them:

- **Directory sizes**: "This is ignored for directories." Drop the
  size field from Item::Directory, always emit 0.

- **Hardlink metadata**: "We ignore all the fields except the
  payload." Zero uid/gid/mode/mtime and skip xattrs, matching the
  C parser which bails out early (mkcomposefs.c:477-491).

- **Xattr ordering**: The spec doesn't define an order. Sort
  lexicographically so output is deterministic regardless of
  on-disk ordering.

The parser still accepts any input values for backward compatibility.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
XFS limits symlink targets to 1024 bytes, and since generic Linux
containers are commonly backed by XFS, enforce that limit in both
the dumpfile parser and the EROFS reader rather than allowing up to
PATH_MAX (4096).

This also avoids exercising a known limitation in our EROFS reader
where symlink data that spills into a non-inline data block (which
can happen with long symlinks + xattrs) is not read back correctly.
See composefs/composefs#342 for the
corresponding C fix for that edge case.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
Increase alignment for dumpfile generation with the composefs C
implementation - on general principle but also motivated by
the goal of reimplementing it in Rust here.

The C composefs implementation uses named escapes for backslash,
newline, carriage return, and tab (\\ \n \r \t), while our writer
was hex-escaping them uniformly (\x5c \x0a etc). Both forms parse
correctly, but byte-identical output matters for cross-implementation
comparison.

Similarly, C only escapes '=' in xattr key/value fields (where it
separates key from value). We were escaping it as \x3d in all fields
including paths and content, where '=' is a normal graphic character.

Assisted-by: OpenCode (Claude Opus 4)
Signed-off-by: Colin Walters <walters@verbum.org>
@cgwalters cgwalters force-pushed the prep-dumpfile-named-escapes branch from a3bb7c2 to 59b7b3a Compare March 18, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant