fix(dotc1z): add fsync and size validation to prevent c1z corruption #627

arreyder · 2026-01-09T15:54:26Z

Summary

Two defensive fixes to prevent corrupt c1z files from being saved/uploaded:

fsync before compression: Add explicit fsync on sqlite database file after Close() but before saveC1z() reads it
Size validation in SaveC1Z: Detect truncated copies by comparing bytes copied to source file size

Problem

Production incidents showed c1z corruption where:

Sync succeeded
SaveC1Z logged success with byte count
But the uploaded file had a truncated zstd stream (missing end-of-stream marker)
Error: "corrupt stream, did not find end of stream"

The zstd end-of-stream marker is written by zstd.Close(). For it to be missing, truncation must occur during or after compression - either in saveC1z() itself or during the copy in SaveC1Z().

Root Cause Analysis

Layer	What could go wrong	Fix
sqlite → saveC1z	Stale data in kernel buffers on ZFS/aggressive caching	fsync before read
saveC1z → SaveC1Z	Truncated io.Copy	Size validation

Why fsync helps (defense in depth)

On filesystems with aggressive caching (like ZFS with ARC), data written by sqlite during Close() may still be in kernel buffers. Without explicit fsync, saveC1z() could theoretically read stale/incomplete data.

Why size validation catches the actual symptom

The observed corruption was a truncated zstd stream. If io.Copy in SaveC1Z() returns fewer bytes than the source file size, we now fail immediately rather than saving a corrupt file.

Changes

`pkg/dotc1z/c1file.go`

// After rawDb.Close(), before saveC1z():
dbFile, err := os.OpenFile(c.dbFilePath, os.O_RDONLY, 0)
if err := dbFile.Sync(); err != nil { ... }
dbFile.Close()

`pkg/dotc1z/manager/local/local.go`

// Before io.Copy:
srcStat, err := tmpFile.Stat()
expectedSize := srcStat.Size()

// After io.Copy:
if size != expectedSize {
    return fmt.Errorf("copy truncated: copied %d bytes, expected %d bytes", size, expectedSize)
}

Cost

Fix	CPU	I/O	Latency
fsync	~0	1 disk sync	~1-10ms
Size validation	~0	1 stat()	<1ms

Total: <15ms - negligible for sync operations that take minutes.

Test plan

go build ./pkg/dotc1z/... passes
go test ./pkg/dotc1z/... passes (113s, all tests pass)
Deploy to staging and monitor for c1z corruption incidents
Verify size validation error is logged if truncation occurs

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Ensured on-disk buffers are flushed before creating compressed outputs to reduce risk of data loss.
- Added strict file-size verification after copy operations to detect and fail on truncated transfers.
- Improved error reporting and handling around file copy and sync steps to surface failures earlier.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Two defensive fixes to prevent corrupt c1z files from being saved: 1. Add explicit fsync on sqlite database file after Close() but before saveC1z() reads it for compression. On filesystems with aggressive caching (like ZFS with ARC), data may still be in kernel buffers. 2. Add size validation in SaveC1Z to detect truncated copies. Compares bytes copied to source file size and fails if they don't match. This catches the actual symptom we observed: truncated zstd streams missing the end-of-stream marker. Root cause investigation: Production incidents showed c1z corruption where sync succeeded and SaveC1Z logged success, but the uploaded file had a truncated zstd stream (missing end-of-stream marker). The size validation would catch this scenario and fail fast rather than uploading corrupt data. Cost: ~5ms for fsync + <1ms for stat - negligible for sync operations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-09T15:54:49Z

Walkthrough

Adds a pre-compression fsync step that opens the DB file read-write and calls Sync when modifications exist, plus additional file-size/stat validation after file-copy operations to detect truncation before finalizing C1Z save.

Changes

Cohort / File(s)	Summary
C1Z pre-sync `pkg/dotc1z/c1file.go`	Adds a pre-compression filesystem synchronization in `CloseContext`: opens the DB file read‑write, calls `Sync`, closes it, and returns cleanup-wrapped errors on failure. Retains WAL checkpoint behavior and subsequent `saveC1z` call.
Copy integrity checks `pkg/dotc1z/manager/local/local.go`	In `SaveC1Z` and `copyFileToTmp` compute `expectedSize` via `Stat`, verify `io.Copy` wrote the full size, and return explicit errors on stat failures or size mismatches; keeps explicit fsync steps.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nudged the disk with a careful paw,
I counted bytes till I saw them all,
A sync, a click, a safe little zip—
Hooray for files on a perfect trip! 🥕✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main changes: adding fsync and size validation to prevent c1z file corruption, which matches the core objectives of the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e24cb0 and 670333b.

📒 Files selected for processing (1)

pkg/dotc1z/c1file.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/dotc1z/c1file.go

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: go-test (1.25.2, windows-latest)
GitHub Check: go-test (1.25.2, ubuntu-latest)

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Sync() on a read-only file descriptor fails on Windows because FlushFileBuffers requires write access. Changed from O_RDONLY to O_RDWR so the fsync works correctly on both Linux and Windows. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR implements two defensive fixes to prevent c1z file corruption observed in production where compressed files had truncated zstd streams missing end-of-stream markers.

Key changes:

Added explicit fsync after SQLite database close but before compression to ensure data is on disk
Added size validation after file copy operations to detect truncated transfers

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
pkg/dotc1z/c1file.go	Adds fsync call on database file after SQLite Close() and before compression to ensure all data is written to disk, particularly important on filesystems with aggressive caching
pkg/dotc1z/manager/local/local.go	Adds size validation by comparing bytes copied via io.Copy to source file size to detect and fail on truncated transfers before saving corrupt files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

arreyder requested review from Copilot, ggreer, kans and loganintech January 9, 2026 16:10

Copilot started reviewing on behalf of arreyder January 9, 2026 16:10 View session

Copilot AI reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dotc1z): add fsync and size validation to prevent c1z corruption #627

fix(dotc1z): add fsync and size validation to prevent c1z corruption #627

Uh oh!

arreyder commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(dotc1z): add fsync and size validation to prevent c1z corruption #627

Are you sure you want to change the base?

fix(dotc1z): add fsync and size validation to prevent c1z corruption #627

Uh oh!

Conversation

arreyder commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause Analysis

Why fsync helps (defense in depth)

Why size validation catches the actual symptom

Changes

pkg/dotc1z/c1file.go

pkg/dotc1z/manager/local/local.go

Cost

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arreyder commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

`pkg/dotc1z/c1file.go`

`pkg/dotc1z/manager/local/local.go`

coderabbitai bot commented Jan 9, 2026 •

edited

Loading