Skip to content

Prepopulate block cache during compaction#14445

Open
mszeszko-meta wants to merge 1 commit intofacebook:mainfrom
mszeszko-meta:populate-block-cache-on-compaction
Open

Prepopulate block cache during compaction#14445
mszeszko-meta wants to merge 1 commit intofacebook:mainfrom
mszeszko-meta:populate-block-cache-on-compaction

Conversation

@mszeszko-meta
Copy link
Contributor

@mszeszko-meta mszeszko-meta commented Mar 10, 2026

Summary

When RocksDB operates with tiered or remote storage (e.g., Warm Storage, HDFS, S3), reading recently compacted data incurs high-latency remote reads because compaction output files are not present in the block cache. The existing prepopulate_block_cache = kFlushOnly avoids this for flush output but leaves compaction output cold until first access.

Add a new PrepopulateBlockCache::kFlushAndCompaction enum value that warms all block types (data, index, filter, compression dict) into the block cache during both flush and compaction. Flush-warmed blocks use LOW priority (unchanged from kFlushOnly behavior), while compaction-warmed blocks use BOTTOM priority — compaction data is less temporally local than freshly flushed data, so it should be the first to be evicted when the cache is full. This gives the remote-read avoidance benefit without risking cache thrashing.

The enum uses kFlushAndCompaction rather than separate kCompactionOnly + kFlushAndCompaction values because there is no practical use case for warming compaction output without also warming flush output. Flush output is by definition the hottest data (just written by the user), so if a workload benefits from warming the colder compaction output, it would always benefit from warming flush output too.

The implementation reuses the existing InsertBlockInCacheHelper / WarmInCache infrastructure in BlockBasedTableBuilder. The only internal change is adding a warm_cache_priority field to Rep alongside the existing warm_cache bool, and plumbing it through to the WarmInCache call instead of the previously hardcoded Cache::Priority::LOW.

Key changes

  • New PrepopulateBlockCache::kFlushAndCompaction enum value in table.h
  • Rep::warm_cache_priority field in BlockBasedTableBuilder for per-reason priority control
  • Serialization support ("kFlushAndCompaction" in string map)
  • db_bench support (--prepopulate_block_cache=2)
  • Crash test coverage (random choice includes new value)

NOTE: Unlike flush output (which is inherently hot — just written by the user), it is hard to distinguish hot from cold blocks in compaction output. Warming all compaction output therefore risks polluting the block cache and evicting genuinely hot entries. The kFlushAndCompaction mode is recommended only for use cases where most or all of the database is expected to reside in cache (e.g., the working set fits in cache). For workloads where only a fraction of the data is hot, kFlushOnly remains the safer choice.

Test Plan

  • New WarmCacheWithDataBlocksDuringCompaction test: verifies data blocks from compaction output are present in the block cache and served without misses
  • Extended DynamicOptions test: verifies dynamic switching through kDisable -> kFlushAndCompaction -> kFlushOnly -> kDisable via SetOptions
  • Existing WarmCacheWithDataBlocksDuringFlush and parameterized WarmCacheWithBlocksDuringFlush tests continue to pass (kFlushOnly behavior unchanged)
  • db_block_cache_test: 81/81 passed
  • options_test: 74/74 passed
  • table_test: 6910/6910 passed

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

✅ clang-tidy: No findings on changed lines

Completed in 194.8s.

@mszeszko-meta mszeszko-meta force-pushed the populate-block-cache-on-compaction branch from 333c727 to 19e1238 Compare March 10, 2026 18:51
@mszeszko-meta mszeszko-meta changed the title Support prepopulating block cache during compaction Prepopulate block cache during compaction Mar 10, 2026
@meta-codesync
Copy link

meta-codesync bot commented Mar 10, 2026

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this in D95997952.

@@ -1203,6 +1204,14 @@ struct BlockBasedTableBuilder::Rep {
switch (table_options.prepopulate_block_cache) {
case BlockBasedTableOptions::PrepopulateBlockCache::kFlushOnly:
warm_cache = (reason == TableFileCreationReason::kFlush);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be updated? also would suggest to move below the switch block.

warm_cache = (reason != TableFileCreationReason::kDisable);

Copy link
Contributor Author

@mszeszko-meta mszeszko-meta Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would result in warm_cache = true for table_options.prepopulate_block_cache = kFlushOnly and reason TableFileCreationReason::kCompaction, while we would expect the value to be false.

@xingbowang
Copy link
Contributor

Some AI review result.

H3: Missing release notes — No entry in unreleased_history/new_features/ or unreleased_history/public_api_changes/. Required per RocksDB conventions, especially important for the forward-incompatibility with older OPTIONS file parsers. (3 agents agreed)

H1: No test verifies BOTTOM priority — The core differentiator (compaction at BOTTOM, flush at LOW) has zero test coverage. The existing MockCache conflates BOTTOM with HIGH. A reproducer using PriorityTrackingCache was built and passes — the feature works correctly but lacks verification. (3 agents agreed)

Summary:
When RocksDB operates with tiered or remote storage (e.g., Warm Storage,
HDFS, S3), reading recently compacted data incurs high-latency remote reads
because compaction output files are not present in the block cache. The
existing `prepopulate_block_cache = kFlushOnly` avoids this for flush output
but leaves compaction output cold until first access.

Add a new `PrepopulateBlockCache::kFlushAndCompaction` enum value that warms
all block types (data, index, filter, compression dict) into the block cache
during both flush and compaction. Flush-warmed blocks use `LOW` priority
(unchanged from kFlushOnly behavior), while compaction-warmed blocks use
`BOTTOM` priority — compaction data is less temporally local than freshly
flushed data, so it should be the first to be evicted when the cache is full.
This gives the remote-read avoidance benefit without risking cache thrashing.

Unlike flush output (which is inherently hot — just written by the user), it
is hard to distinguish hot from cold blocks in compaction output. Warming all
compaction output therefore risks polluting the block cache and evicting
genuinely hot entries. The kFlushAndCompaction mode is recommended only for
use cases where most or all of the database is expected to reside in cache
(e.g., the working set fits in cache). For workloads where only a fraction
of the data is hot, kFlushOnly remains the safer choice.

The enum uses `kFlushAndCompaction` rather than separate `kCompactionOnly` +
`kFlushAndCompaction` values because there is no practical use case for
warming compaction output without also warming flush output. Flush output is
by definition the hottest data (just written by the user), so if a workload
benefits from warming the colder compaction output, it would always benefit
from warming flush output too.

The implementation reuses the existing `InsertBlockInCacheHelper` /
`WarmInCache` infrastructure in BlockBasedTableBuilder. The only internal
change is adding a `warm_cache_priority` field to `Rep` alongside the
existing `warm_cache` bool, and plumbing it through to the `WarmInCache`
call instead of the previously hardcoded `Cache::Priority::LOW`.

Key changes:
- New `PrepopulateBlockCache::kFlushAndCompaction` enum value in table.h
- `Rep::warm_cache_priority` field in BlockBasedTableBuilder for
  per-reason priority control
- Serialization support ("kFlushAndCompaction" in string map)
- db_bench support (--prepopulate_block_cache=2)
- Crash test coverage (random choice includes new value)
- Release notes in unreleased_history/

Test Plan:
- New `WarmCacheWithDataBlocksDuringCompaction` test: verifies data blocks
  from compaction output are present in the block cache and served without
  misses
- New `WarmCachePriorityFlushVsCompaction` test: uses a PriorityTrackingCache
  wrapper to verify flush inserts at LOW and compaction inserts at BOTTOM
- Extended `DynamicOptions` test: verifies dynamic switching through
  kDisable -> kFlushAndCompaction -> kFlushOnly -> kDisable via SetOptions
- Existing `WarmCacheWithDataBlocksDuringFlush` and parameterized
  `WarmCacheWithBlocksDuringFlush` tests continue to pass (kFlushOnly
  behavior unchanged)
- db_block_cache_test: 82/82 passed
- options_test: 74/74 passed
- table_test: 6910/6910 passed
@mszeszko-meta mszeszko-meta force-pushed the populate-block-cache-on-compaction branch from 19e1238 to fef6353 Compare March 10, 2026 21:01
@mszeszko-meta
Copy link
Contributor Author

Some AI review result.

H3: Missing release notes — No entry in unreleased_history/new_features/ or unreleased_history/public_api_changes/. Required per RocksDB conventions, especially important for the forward-incompatibility with older OPTIONS file parsers. (3 agents agreed)

H1: No test verifies BOTTOM priority — The core differentiator (compaction at BOTTOM, flush at LOW) has zero test coverage. The existing MockCache conflates BOTTOM with HIGH. A reproducer using PriorityTrackingCache was built and passes — the feature works correctly but lacks verification. (3 agents agreed)

Addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants