Prepopulate block cache during compaction by mszeszko-meta · Pull Request #14445 · facebook/rocksdb

mszeszko-meta · 2026-03-10T18:01:23Z

Summary

When RocksDB operates with tiered or remote storage (e.g., Warm Storage, HDFS, S3), reading recently compacted data incurs high-latency remote reads because compaction output files are not present in the block cache. The existing prepopulate_block_cache = kFlushOnly avoids this for flush output but leaves compaction output cold until first access.

Add a new PrepopulateBlockCache::kFlushAndCompaction enum value that warms all block types (data, index, filter, compression dict) into the block cache during both flush and compaction. Flush-warmed blocks use LOW priority (unchanged from kFlushOnly behavior), while compaction-warmed blocks use BOTTOM priority — compaction data is less temporally local than freshly flushed data, so it should be the first to be evicted when the cache is full. This gives the remote-read avoidance benefit without risking cache thrashing.

The enum uses kFlushAndCompaction rather than separate kCompactionOnly + kFlushAndCompaction values because there is no practical use case for warming compaction output without also warming flush output. Flush output is by definition the hottest data (just written by the user), so if a workload benefits from warming the colder compaction output, it would always benefit from warming flush output too.

The implementation reuses the existing InsertBlockInCacheHelper / WarmInCache infrastructure in BlockBasedTableBuilder. The only internal change is adding a warm_cache_priority field to Rep alongside the existing warm_cache bool, and plumbing it through to the WarmInCache call instead of the previously hardcoded Cache::Priority::LOW.

Key changes

New PrepopulateBlockCache::kFlushAndCompaction enum value in table.h
Rep::warm_cache_priority field in BlockBasedTableBuilder for per-reason priority control
Serialization support ("kFlushAndCompaction" in string map)
db_bench support (--prepopulate_block_cache=2)
Crash test coverage (random choice includes new value)

NOTE: Unlike flush output (which is inherently hot — just written by the user), it is hard to distinguish hot from cold blocks in compaction output. Warming all compaction output therefore risks polluting the block cache and evicting genuinely hot entries. The kFlushAndCompaction mode is recommended only for use cases where most or all of the database is expected to reside in cache (e.g., the working set fits in cache). For workloads where only a fraction of the data is hot, kFlushOnly remains the safer choice.

Test Plan

New WarmCacheWithDataBlocksDuringCompaction test: verifies data blocks from compaction output are present in the block cache and served without misses
Extended DynamicOptions test: verifies dynamic switching through kDisable -> kFlushAndCompaction -> kFlushOnly -> kDisable via SetOptions
Existing WarmCacheWithDataBlocksDuringFlush and parameterized WarmCacheWithBlocksDuringFlush tests continue to pass (kFlushOnly behavior unchanged)
db_block_cache_test: 81/81 passed
options_test: 74/74 passed
table_test: 6910/6910 passed

github-actions · 2026-03-10T18:07:18Z

✅ clang-tidy: No findings on changed lines

Completed in 194.8s.

meta-codesync · 2026-03-10T19:48:52Z

@mszeszko-meta has imported this pull request. If you are a Meta employee, you can view this in D95997952.

include/rocksdb/table.h

xingbowang · 2026-03-10T19:40:33Z

table/block_based/block_based_table_builder.cc

@@ -1203,6 +1204,14 @@ struct BlockBasedTableBuilder::Rep {
    switch (table_options.prepopulate_block_cache) {
      case BlockBasedTableOptions::PrepopulateBlockCache::kFlushOnly:
        warm_cache = (reason == TableFileCreationReason::kFlush);


Should this be updated? also would suggest to move below the switch block.

warm_cache = (reason != TableFileCreationReason::kDisable);

This would result in warm_cache = true for table_options.prepopulate_block_cache = kFlushOnly and reason TableFileCreationReason::kCompaction, while we would expect the value to be false.

table/block_based/block_based_table_builder.cc

xingbowang · 2026-03-10T20:10:10Z

Some AI review result.

H3: Missing release notes — No entry in unreleased_history/new_features/ or unreleased_history/public_api_changes/. Required per RocksDB conventions, especially important for the forward-incompatibility with older OPTIONS file parsers. (3 agents agreed)

H1: No test verifies BOTTOM priority — The core differentiator (compaction at BOTTOM, flush at LOW) has zero test coverage. The existing MockCache conflates BOTTOM with HIGH. A reproducer using PriorityTrackingCache was built and passes — the feature works correctly but lacks verification. (3 agents agreed)

Summary: When RocksDB operates with tiered or remote storage (e.g., Warm Storage, HDFS, S3), reading recently compacted data incurs high-latency remote reads because compaction output files are not present in the block cache. The existing `prepopulate_block_cache = kFlushOnly` avoids this for flush output but leaves compaction output cold until first access. Add a new `PrepopulateBlockCache::kFlushAndCompaction` enum value that warms all block types (data, index, filter, compression dict) into the block cache during both flush and compaction. Flush-warmed blocks use `LOW` priority (unchanged from kFlushOnly behavior), while compaction-warmed blocks use `BOTTOM` priority — compaction data is less temporally local than freshly flushed data, so it should be the first to be evicted when the cache is full. This gives the remote-read avoidance benefit without risking cache thrashing. Unlike flush output (which is inherently hot — just written by the user), it is hard to distinguish hot from cold blocks in compaction output. Warming all compaction output therefore risks polluting the block cache and evicting genuinely hot entries. The kFlushAndCompaction mode is recommended only for use cases where most or all of the database is expected to reside in cache (e.g., the working set fits in cache). For workloads where only a fraction of the data is hot, kFlushOnly remains the safer choice. The enum uses `kFlushAndCompaction` rather than separate `kCompactionOnly` + `kFlushAndCompaction` values because there is no practical use case for warming compaction output without also warming flush output. Flush output is by definition the hottest data (just written by the user), so if a workload benefits from warming the colder compaction output, it would always benefit from warming flush output too. The implementation reuses the existing `InsertBlockInCacheHelper` / `WarmInCache` infrastructure in BlockBasedTableBuilder. The only internal change is adding a `warm_cache_priority` field to `Rep` alongside the existing `warm_cache` bool, and plumbing it through to the `WarmInCache` call instead of the previously hardcoded `Cache::Priority::LOW`. Key changes: - New `PrepopulateBlockCache::kFlushAndCompaction` enum value in table.h - `Rep::warm_cache_priority` field in BlockBasedTableBuilder for per-reason priority control - Serialization support ("kFlushAndCompaction" in string map) - db_bench support (--prepopulate_block_cache=2) - Crash test coverage (random choice includes new value) - Release notes in unreleased_history/ Test Plan: - New `WarmCacheWithDataBlocksDuringCompaction` test: verifies data blocks from compaction output are present in the block cache and served without misses - New `WarmCachePriorityFlushVsCompaction` test: uses a PriorityTrackingCache wrapper to verify flush inserts at LOW and compaction inserts at BOTTOM - Extended `DynamicOptions` test: verifies dynamic switching through kDisable -> kFlushAndCompaction -> kFlushOnly -> kDisable via SetOptions - Existing `WarmCacheWithDataBlocksDuringFlush` and parameterized `WarmCacheWithBlocksDuringFlush` tests continue to pass (kFlushOnly behavior unchanged) - db_block_cache_test: 82/82 passed - options_test: 74/74 passed - table_test: 6910/6910 passed

mszeszko-meta · 2026-03-10T21:01:50Z

Some AI review result.

H3: Missing release notes — No entry in unreleased_history/new_features/ or unreleased_history/public_api_changes/. Required per RocksDB conventions, especially important for the forward-incompatibility with older OPTIONS file parsers. (3 agents agreed)

H1: No test verifies BOTTOM priority — The core differentiator (compaction at BOTTOM, flush at LOW) has zero test coverage. The existing MockCache conflates BOTTOM with HIGH. A reproducer using PriorityTrackingCache was built and passes — the feature works correctly but lacks verification. (3 agents agreed)

Addressed.

mszeszko-meta requested review from anand1976, pdillinger and xingbowang March 10, 2026 18:01

meta-cla bot added the CLA Signed label Mar 10, 2026

mszeszko-meta force-pushed the populate-block-cache-on-compaction branch from 333c727 to 19e1238 Compare March 10, 2026 18:51

mszeszko-meta changed the title ~~Support prepopulating block cache during compaction~~ Prepopulate block cache during compaction Mar 10, 2026

xingbowang reviewed Mar 10, 2026

View reviewed changes

mszeszko-meta force-pushed the populate-block-cache-on-compaction branch from 19e1238 to fef6353 Compare March 10, 2026 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepopulate block cache during compaction#14445

Prepopulate block cache during compaction#14445
mszeszko-meta wants to merge 1 commit intofacebook:mainfrom
mszeszko-meta:populate-block-cache-on-compaction

mszeszko-meta commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Mar 10, 2026

Uh oh!

Uh oh!

xingbowang Mar 10, 2026

Uh oh!

mszeszko-meta Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

xingbowang commented Mar 10, 2026

Uh oh!

mszeszko-meta commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mszeszko-meta commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Test Plan

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ clang-tidy: No findings on changed lines

Uh oh!

meta-codesync bot commented Mar 10, 2026

Uh oh!

Uh oh!

xingbowang Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

mszeszko-meta Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xingbowang commented Mar 10, 2026

Uh oh!

mszeszko-meta commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mszeszko-meta commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

mszeszko-meta Mar 10, 2026 •

edited

Loading