-
Notifications
You must be signed in to change notification settings - Fork 4
atomic write on saveC1z, maybe stop shipping bad c1z #606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughSave routines were changed to write to a temporary file (os.CreateTemp), sync and close it, then atomically rename to the final path; temp-file cleanup and more robust Close()/error handling were added. Tests and a benchmark were added to validate atomic-write behavior and performance. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
pkg/dotc1z/file_test.go (1)
120-132: Test may not exercise temp file cleanup path.When
dbFilePathdoesn't exist,saveC1zfails at line 67 (os.Open), before the temp file is created at line 83. This test validates that no temp files exist, but the cleanup code in the defer is never actually exercised.Consider injecting a failure after the temp file is created (e.g., mocking a write error, or using a read-only output directory for
Syncto fail) to truly test the cleanup path.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pkg/dotc1z/file.gopkg/dotc1z/file_test.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2024-10-08T21:29:30.695Z
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-10-08T21:29:30.695Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `cleanup` method of the `DBCache` struct in the `pkg/uhttp/dbcache.go` file.
Applied to files:
pkg/dotc1z/file.go
📚 Learning: 2024-09-03T15:53:47.572Z
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-09-03T15:53:47.572Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `NewDBCache` function of the `DBCache` struct in the `pkg/uhttp/dbcache.go` file.
Applied to files:
pkg/dotc1z/file.go
🪛 GitHub Check: go-lint
pkg/dotc1z/file_test.go
[failure] 80-80:
unnecessary trailing newline (whitespace)
[failure] 85-85:
Comment should end in a period (godot)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: go-test (1.25.2, ubuntu-latest)
- GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (4)
pkg/dotc1z/file.go (2)
70-76: LGTM!Proper nil-guard and error logging for deferred cleanup. This ensures close errors are visible without masking the primary return value.
78-98: Well-designed atomic write pattern.Creating the temp file in the same directory as the output ensures they're on the same filesystem, which is required for atomic rename. The unconditional
os.Remove(tmpPath)in defer is safe—after a successful rename, the temp file no longer exists at that path, so the remove silently fails.pkg/dotc1z/file_test.go (2)
13-42: LGTM!Good benchmark coverage with varying input sizes. The use of
b.ResetTimer()correctly excludes setup time from measurements.
134-144: LGTM!This correctly validates that no output file is created when the source file is missing.
| // Atomic rename: outputFilePath now has complete, valid data | ||
| // This is the only point where outputFilePath is modified | ||
| if err = os.Rename(tmpPath, outputFilePath); err != nil { | ||
| return fmt.Errorf("failed to rename temp file to output: %w", err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Go os.Rename Windows behavior destination file exists atomicity
💡 Result:
Short answer
- os.Rename on Windows does NOT atomically replace an existing destination file. On non‑Unix platforms (Windows) Rename is not guaranteed atomic and will fail or behave differently when the destination exists. Use a platform‑specific replace (MoveFileEx with REPLACE_EXISTING) or an atomic helper package. [1][2][3]
Details & practical options
- The Go runtime/OS docs state Rename may replace dst on Unix but "Even within the same directory, on non‑Unix platforms Rename is not an atomic operation." (Windows has different semantics). [3]
- The Go issue tracker records this platform difference and the rationale for providing a separate “replace” style operation on Windows. os.Rename was not made to provide Unix‑style atomic overwrite on Windows. [1]
- Workarounds:
- Use a library that calls the Windows API (MoveFileEx with MOVEFILE_REPLACE_EXISTING) to perform an atomic replace on Windows (examples: natefinch/atomic and other "replacefile"/"atomic rename" packages). These guarantee either full replace or no change on Windows. [2][4]
- If you attempt manual Remove(dst) + os.Rename(src,dst) be aware that this is not atomic and has race/permission/use‑in‑use failure modes on Windows. [1][5]
Sources
- Go issue: "os: make Rename atomic on Windows" (golang/go). [1]
- github.com/natefinch/atomic (atomic replace/write for Windows using MoveFileEx). [2]
- os.Rename docs / pkg: "Even within the same directory, on non‑Unix platforms Rename is not an atomic operation." [3]
- replace/atomic helper implementations (example pkg docs). [4]
- Discussion/examples showing Windows rename behavior and pitfalls. [5]
🏁 Script executed:
# First, let me examine the code at lines 150-154 and surrounding context
cd /root && find . -type f -name "file.go" -path "*/dotc1z/*" 2>/dev/null | head -5Repository: ConductorOne/baton-sdk
Length of output: 121
🏁 Script executed:
# Find the file
find . -type f -name "file.go" -path "*dotc1z*" 2>/dev/nullRepository: ConductorOne/baton-sdk
Length of output: 86
🏁 Script executed:
# Also search for any files in dotc1z directory
find . -type d -name "dotc1z" 2>/dev/nullRepository: ConductorOne/baton-sdk
Length of output: 78
🏁 Script executed:
# Read the file, focusing on lines around 150-154
cat -n ./pkg/dotc1z/file.go | sed -n '120,160p'Repository: ConductorOne/baton-sdk
Length of output: 1213
🏁 Script executed:
# Read the full function to understand the context - let's see from the function start
cat -n ./pkg/dotc1z/file.go | sed -n '80,160p'Repository: ConductorOne/baton-sdk
Length of output: 2633
🏁 Script executed:
# Check the function signature and see if outputFilePath is expected to exist or not
cat -n ./pkg/dotc1z/file.go | sed -n '60,90p'Repository: ConductorOne/baton-sdk
Length of output: 1197
🏁 Script executed:
# Also search for calls to this function to understand the usage pattern
rg "os\.Rename" ./pkg/dotc1z/file.go -B 5 -A 5Repository: ConductorOne/baton-sdk
Length of output: 385
🏁 Script executed:
# Search for calls to saveC1z function
rg "saveC1z" ./pkg/dotc1z/ -B 2 -A 2Repository: ConductorOne/baton-sdk
Length of output: 4153
🏁 Script executed:
# Also check if there's any code that removes or checks for outputFilePath existence
rg "outputFilePath" ./pkg/dotc1z/file.go -B 2 -A 2Repository: ConductorOne/baton-sdk
Length of output: 953
🏁 Script executed:
# Let me check the specific test that validates atomic write behavior with existing file
cat -n ./pkg/dotc1z/file_test.go | sed -n '50,130p'Repository: ConductorOne/baton-sdk
Length of output: 3341
🏁 Script executed:
# Check if there's a test that specifically tests re-saving to the same outputFilePath
cat -n ./pkg/dotc1z/file_test.go | sed -n '130,180p'Repository: ConductorOne/baton-sdk
Length of output: 721
🏁 Script executed:
# Search for any code that might handle Windows-specific behavior for file operations
rg "windows\|Windows\|GOOS\|runtime.GOOS" ./pkg/dotc1z/ -iRepository: ConductorOne/baton-sdk
Length of output: 48
Address Windows compatibility for atomic file replacement.
On Unix, os.Rename atomically replaces an existing destination file. On Windows, os.Rename returns an error if the destination already exists—it does not support atomic replacement.
If outputFilePath already exists during a re-sync operation, this will fail on Windows. Consider using a platform-aware helper (e.g., natefinch/atomic or a wrapper calling MoveFileEx with MOVEFILE_REPLACE_EXISTING on Windows) to support atomic replacement across platforms.
🤖 Prompt for AI Agents
In pkg/dotc1z/file.go around lines 150 to 154, the code uses os.Rename to move
tmpPath to outputFilePath which is atomic on Unix but fails on Windows if the
destination exists; replace this call with a platform-aware atomic replace
helper (for example use the natefinch/atomic package or implement a small
wrapper that calls MoveFileEx with MOVEFILE_REPLACE_EXISTING on Windows and
os.Rename on Unix), ensure the helper preserves file permissions/ownership where
appropriate, return the helper's error on failure, and add a brief comment
noting the platform-specific behavior and why the helper is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ This is valid. This code is exercised inservice mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kans, thank you for confirming this is a valid concern. Since this code runs in service mode and could potentially be executed on Windows, addressing the atomic rename behavior would improve cross-platform reliability.
Would you like to:
- Address the Windows compatibility in this PR by using a library like
natefinch/atomicor implementing a platform-aware wrapper? - Create a follow-up issue to track this for a future PR?
If you'd prefer option 2, I can create an issue with the details for you.
🧠 Learnings used
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-10-08T21:29:30.695Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `cleanup` method of the `DBCache` struct in the `pkg/uhttp/dbcache.go` file.
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-09-03T15:49:24.881Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `cleanup` method of the `DBCache` struct in the `pkg/uhttp/dbcache.go` file.
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-09-03T15:53:47.572Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `NewDBCache` function of the `DBCache` struct in the `pkg/uhttp/dbcache.go` file.
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-09-03T15:48:24.208Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `DBCache` methods in the `pkg/uhttp/dbcache.go` file.
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-10-08T21:29:30.695Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `DBCache` methods in the `pkg/uhttp/dbcache.go` file.
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:267-278
Timestamp: 2024-10-08T21:29:30.695Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `DBCache` methods in the `pkg/uhttp/dbcache.go` file.
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:267-278
Timestamp: 2024-09-03T15:50:49.204Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `DBCache` methods in the `pkg/uhttp/dbcache.go` file.
Learnt from: mchavez
Repo: ConductorOne/baton-sdk PR: 211
File: pkg/uhttp/dbcache.go:0-0
Timestamp: 2024-09-03T15:51:48.712Z
Learning: The user mchavez has fixed the issue related to improving error messages and adding comments in the `Get` method of the `DBCache` struct in the `pkg/uhttp/dbcache.go` file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
pkg/dotc1z/manager/local/local.go (1)
166-204: LGTM: Atomic write pattern correctly implemented.The refactoring successfully implements the atomic write pattern to prevent partial/corrupt c1z files:
- ✓ Temp file created in destination directory (ensures same filesystem for atomic rename)
- ✓ Data synced before rename (critical for durability, especially on ZFS)
- ✓ File closed explicitly with error handling before rename
- ✓ Atomic rename operation via
os.Rename- ✓ Cleanup of temp file on failure via defer
- ✓ Prevention of double-close by setting
dstFile = nilThis correctly addresses the PR objective of preventing corrupt streams by ensuring the destination file never contains partial data.
Optional: Consider syncing the directory for maximum durability
For maximum durability guarantees, consider syncing the parent directory after the rename to ensure the directory entry is persisted:
// Atomic rename: filePath now has complete, valid data. if err := os.Rename(dstTmpPath, l.filePath); err != nil { return fmt.Errorf("failed to rename temp file to destination: %w", err) } // Optional: sync directory to ensure rename is persisted if dir, err := os.Open(dstDir); err == nil { dir.Sync() dir.Close() }Note: This is often omitted in practice as the durability benefit is marginal in most scenarios, and the code is already robust without it.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pkg/dotc1z/manager/local/local.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: go-test (1.25.2, windows-latest)
🔇 Additional comments (1)
pkg/dotc1z/manager/local/local.go (1)
8-8: LGTM: Import addition supports temp file path handling.The
filepathimport is necessary for theDir()andBase()operations used in the atomic write implementation.
Maybe fix c1z corruption by using atomic writes in saveC1z
we are seeing "corrupt stream, did not find end of stream" errors on c1z files. The issue: saveC1z wrote directly to the output file, so any interruption (crash, error, kill) left a partial/corrupt file that could get uploaded to S3.
write to a temp file first, rename to output on success. If anything fails, the original file is untouched and the temp gets cleaned up.
I cannot attribute/explain, how/if this relates to zfs? Only guess is it some how exposed a latent issue.
also added tests for the atomic behavior and a benchmark.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.