Skip to content

Fix data race in MemTable::GetBloomFilter() on ARM weak memory model#424

Merged
ti-chi-bot[bot] merged 3 commits intotikv:8.10.tikvfrom
hhwyt:fix-bloolfilter-race
Nov 18, 2025
Merged

Fix data race in MemTable::GetBloomFilter() on ARM weak memory model#424
ti-chi-bot[bot] merged 3 commits intotikv:8.10.tikvfrom
hhwyt:fix-bloolfilter-race

Conversation

@hhwyt
Copy link

@hhwyt hhwyt commented Nov 17, 2025

Close tikv/tikv#19106
Fix data race in MemTable::GetBloomFilter() on ARM weak memory model
introduced in #305 (commit 405de0e). Affected version: TiKV 7.1+.
The issue was also proved using Relacy formal verification tool, which confirmed
the data race exists when using relaxed memory ordering, see more detail:
tikv/tikv#19106.

Sulution: Uses release-acquire to prevent data race on weak memory models (e.g.,
ARM). Without it: Thread1 in DynamicBloom::new() first initializes
data_, then stores bloom_filter_ptr_. With relaxed ordering, these two
writes may reach another CPU's cache in reversed order, causing Thread2
to see non-null bloom_filter_ptr_ but uninitialized data_ (crash). With
release-acquire: store(release) ensures all prior writes (including
data_ initialization) are visible before bloom_filter_ptr_ update,
load(acquire) ensures we see them in order, guaranteeing fully
initialized object.

Performance: This load() executes on every read/write call (hot path).
Acquire vs relaxed overhead: On x86, both compile to the same
instruction (MOV), so cost is identical. On ARM, acquire may add a
memory barrier (~1-3 cycles). In practice, since bloom_filter_ptr_ is
written once (during initialization with release) and then only read,
the cache line is typically in Shared state at runtime, meaning both
acquire and relaxed loads read from local cache with similar
performance. The acquire overhead is typically negligible compared to
subsequent bloom filter operations (hash computation and memory access).

@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 17, 2025
@hhwyt hhwyt force-pushed the fix-bloolfilter-race branch from bb5a10e to 6b5bbd2 Compare November 17, 2025 07:06
@hhwyt hhwyt force-pushed the fix-bloolfilter-race branch from 6b5bbd2 to 3d604a5 Compare November 17, 2025 07:11
@hhwyt hhwyt marked this pull request as draft November 17, 2025 07:14
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2025
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 17, 2025
@hhwyt hhwyt force-pushed the fix-bloolfilter-race branch 2 times, most recently from 640465e to 25d02b2 Compare November 17, 2025 07:31
Use release-acquire memory ordering instead of relaxed to ensure
DynamicBloom object is fully initialized before pointer publication.

Problem:
- On ARM (weak memory model), using relaxed ordering allows the pointer
  to be published before DynamicBloom::data_ is initialized
- Readers may see non-null pointer but access uninitialized data_,
  causing crashes

Solution:
- Use memory_order_acquire for load and memory_order_release for store
- This establishes synchronizes-with relationship, guaranteeing readers
  only see fully initialized objects

Performance impact:
- The acquire load executes on every call (hot path in Get/Add operations)
- Overhead is minimal: typically 0-2 cycles on x86 (same as relaxed),
  ~1-3 cycles on ARM (may add memory barrier)
- This overhead is <1% of total latency compared to subsequent bloom
  filter operations (hash computation + memory access)

Signed-off-by: hhwyt <hhwyt1@gmail.com>
@hhwyt hhwyt force-pushed the fix-bloolfilter-race branch from 25d02b2 to 7101ba0 Compare November 17, 2025 07:32
@hhwyt hhwyt marked this pull request as ready for review November 17, 2025 08:08
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 17, 2025
Copy link

@glorv glorv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added lgtm approved and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 17, 2025
@hhwyt hhwyt requested a review from zhangjinpeng87 November 17, 2025 08:42
Signed-off-by: hhwyt <hhwyt1@gmail.com>
@ti-chi-bot ti-chi-bot bot removed the lgtm label Nov 17, 2025
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 17, 2025
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 17, 2025
@hhwyt
Copy link
Author

hhwyt commented Nov 18, 2025

/merge

@ti-chi-bot ti-chi-bot bot removed the lgtm label Nov 18, 2025
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 18, 2025
@ti-chi-bot ti-chi-bot bot added the lgtm label Nov 18, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: glorv, hbisheng, LykxSassinator, overvenus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [LykxSassinator,glorv,overvenus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 18, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Nov 18, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-11-17 07:14:53.681049955 +0000 UTC m=+1291143.124079824: ☑️ agreed by hbisheng.
  • 2025-11-17 08:42:14.147024192 +0000 UTC m=+1296383.590054061: ☑️ agreed by glorv.
  • 2025-11-17 08:53:13.4345248 +0000 UTC m=+1297042.877554678: ✖️🔁 reset by hhwyt.
  • 2025-11-17 08:55:02.441129075 +0000 UTC m=+1297151.884158954: ☑️ agreed by glorv.
  • 2025-11-17 08:55:19.265193595 +0000 UTC m=+1297168.708223484: ☑️ agreed by hbisheng.
  • 2025-11-18 03:57:36.403560138 +0000 UTC m=+1365705.846590017: ✖️🔁 reset by hhwyt.
  • 2025-11-18 03:58:41.700903921 +0000 UTC m=+1365771.143933790: ☑️ agreed by glorv.
  • 2025-11-18 03:58:49.077951726 +0000 UTC m=+1365778.520981595: ☑️ agreed by hbisheng.

Copy link
Member

@Connor1996 Connor1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, could we contribute it to upstream?

@hhwyt
Copy link
Author

hhwyt commented Nov 19, 2025

Nice find, could we contribute it to upstream?

This bug only exists in tikv/rocksdb introduced in #305 (commit 405de0e), so we don't need to contribute it to upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TiKV Crash during RocksDB write operation

6 participants