[WIP] Fix stale atomic_update_versions_ trapping version installation in secondary DB by hx235 · Pull Request #14439 · facebook/rocksdb

hx235 · 2026-03-09T01:44:49Z

Context/Summary:

When atomic_flush=true, flush writes version edits for all CFs as an atomic group in the manifest. VersionEditHandlerPointInTime replays these groups using a staging map (atomic_update_versions_): versions are held there until ALL CFs in the group have valid versions, then moved to versions_ for installation. If any CF has missing files, the map is never cleared.

Bug:
For secondary/follower DB (ManifestTailer), files can become permanently unavailable when manifest rotation races with primary compaction. Example: secondary opens MANIFEST-4, then the primary compacts CF0 (deleting CF0's atomic-flushed SST) and rotates to MANIFEST-5. When the secondary reads the manifest it already opened, CF0's file is gone — the atomic group is permanently incomplete: atomic_update_versions_ stays {CF0: nullptr, CF1: valid_version}.

Once stuck, all subsequent version creation for CF0 and CF1 is routed into this stale map, so nothing reaches versions_ and no versions are ever installed. Symptoms: remote compaction "Cannot find matched SST files", secondary DB serving stale data.

TryCatchUpWithPrimary cannot be used as a fix because the primary keeps writing new atomic groups, each of which can also become incomplete due to the same manifest rotation race. The secondary would be endlessly chasing a moving target — each catch-up attempt encounters new incomplete atomic groups that re-pollute the staging map.

We can relax the all-or-nothing guarantee for ManifestTailer because it exists for primary best-effort recovery where missing files mean data loss on a frozen filesystem. For secondary DB, files are missing because the primary compacted them away — not data loss. TODO: consider cross-CF read consistency requirement in secondary db and whether that conflicts with this change.

Fix: override OnAtomicGroupReplayEnd() in ManifestTailer to salvage valid versions from incomplete atomic groups and clear the staging map. The all-or-nothing invariant for primary best-effort recovery is preserved.

Test Plan:

New test AtomicFlushRemoteCompactionMissingFile reproduces the race where secondary DB gets an incomplete atomic group due to concurrent primary compaction + manifest rotation.

Before fix:

[ RUN      ] CompactionServiceTest.AtomicFlushRemoteCompactionMissingFile
db/compaction/compaction_service_test.cc:2924: Failure
db_->CompactRange(cro, handles_[1], nullptr, nullptr)
Result incomplete: CompactionService failed to run the compaction job (even though the internal status is okay).
[  FAILED  ] CompactionServiceTest.AtomicFlushRemoteCompactionMissingFile (1211 ms)
[----------] 1 test from CompactionServiceTest (1211 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1211 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] CompactionServiceTest.AtomicFlushRemoteCompactionMissingFile

After fix: compaction succeeds, both CFs compacted to L1

github-actions · 2026-03-09T01:51:55Z

✅ clang-tidy: No findings on changed lines

Completed in 288.1s.

meta-codesync · 2026-03-09T02:11:55Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this in D95749838.

hx235 · 2026-03-09T05:27:39Z

linter not related

hx235 · 2026-03-09T21:27:18Z

Ah history md was not auto-generated - just pushed a new one on top of clean testing signals (excluding the unrelated linter warning)

…condary DB Summary: When `atomic_flush=true`, flush writes version edits for all CFs as an atomic group in the manifest. `VersionEditHandlerPointInTime` replays these groups using a staging map (`atomic_update_versions_`): versions are held there until ALL CFs in the group have valid versions, then moved to `versions_` for installation. If any CF has missing files, the map is never cleared. For secondary/follower DB (`ManifestTailer`), files can become permanently unavailable when manifest rotation races with primary compaction. Example: secondary opens MANIFEST-4, then the primary compacts CF0 (deleting CF0's atomic-flushed SST) and rotates to MANIFEST-5. When the secondary reads the manifest it already opened, CF0's file is gone — the atomic group is permanently incomplete: `atomic_update_versions_` stays {CF0: nullptr, CF1: valid_version}. Once stuck, all subsequent version creation for CF0 and CF1 is routed into this stale map, so nothing reaches `versions_` and no versions are ever installed. Symptoms: remote compaction "Cannot find matched SST files", secondary DB serving stale data. TryCatchUpWithPrimary cannot be used as a fix because the primary keeps writing new atomic groups, each of which can also become incomplete due to the same manifest rotation race. The secondary would be endlessly chasing a moving target — each catch-up attempt encounters new incomplete atomic groups that re-pollute the staging map. We can relax the all-or-nothing guarantee for ManifestTailer because it exists for primary best-effort recovery where missing files mean data loss on a frozen filesystem. For secondary DB, files are missing because the primary compacted them away — not data loss. Cross-CF read consistency in secondary is provided at a different layer (NewIterators/MultiGet use shared sequence numbers), so installing valid CF versions individually is safe. Fix: override OnAtomicGroupReplayEnd() in ManifestTailer to salvage valid versions from incomplete atomic groups and clear the staging map. The all-or-nothing invariant for primary best-effort recovery is preserved. Test Plan: New test `AtomicFlushRemoteCompactionMissingFile` reproduces the race where secondary DB gets an incomplete atomic group due to concurrent primary compaction + manifest rotation. - Before fix: "Cannot find matched SST files for the following file numbers: 9 14" - After fix: compaction succeeds, both CFs compacted to L1

meta-codesync · 2026-03-09T21:47:54Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this in D95749838.

hx235 · 2026-03-09T22:30:14Z

Turning into draft mode in order to work better on "TODO: consider cross-CF read consistency requirement in secondary db and whether that conflicts with this change."

meta-cla bot added the CLA Signed label Mar 9, 2026

hx235 force-pushed the repro_af_secondary branch from 3e409ae to b3010a1 Compare March 9, 2026 02:11

hx235 requested review from anand1976 and jaykorean and removed request for jaykorean March 9, 2026 05:27

hx235 force-pushed the repro_af_secondary branch from b3010a1 to 5fbf050 Compare March 9, 2026 21:45

hx235 removed the request for review from anand1976 March 9, 2026 22:30

hx235 changed the title ~~Fix stale atomic_update_versions_ trapping version installation in secondary DB~~ [WIP] Fix stale atomic_update_versions_ trapping version installation in secondary DB Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix stale atomic_update_versions_ trapping version installation in secondary DB#14439

[WIP] Fix stale atomic_update_versions_ trapping version installation in secondary DB#14439
hx235 wants to merge 1 commit intofacebook:mainfrom
hx235:repro_af_secondary

hx235 commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

hx235 commented Mar 9, 2026

Uh oh!

hx235 commented Mar 9, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

hx235 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hx235 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context/Summary:

Test Plan:

Uh oh!

github-actions bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ clang-tidy: No findings on changed lines

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

hx235 commented Mar 9, 2026

Uh oh!

hx235 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync bot commented Mar 9, 2026

Uh oh!

hx235 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hx235 commented Mar 9, 2026 •

edited

Loading

github-actions bot commented Mar 9, 2026 •

edited

Loading

hx235 commented Mar 9, 2026 •

edited

Loading