Skip to content

Conversation

@zuston
Copy link
Member

@zuston zuston commented Nov 18, 2025

What changes were proposed in this pull request?

Strict check processed blockIds

Why are the changes needed?

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests

@zuston
Copy link
Member Author

zuston commented Nov 18, 2025

If we disable the multi replicas of shuffle-data, we should strict check the processed blockIds. WDYT? @jerqi

@github-actions
Copy link

Test Results

 2 765 files   -   385   2 765 suites   - 385   5h 30m 54s ⏱️ - 1h 18m 44s
   948 tests  -   265     934 ✅  -   277   1 💤 ±0  0 ❌ ±0   13 🔥 + 12 
13 712 runs   - 1 632  13 527 ✅  - 1 801  15 💤 ±0  0 ❌ ±0  170 🔥 +169 

For more details on these errors, see this check.

Results for commit 0e2d03e. ± Comparison against base commit de55bd9.

This pull request removes 266 and adds 1 tests. Note that renamed tests count towards both.
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testCombineBuffer
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testCommitBlocksWhenMemoryShuffleDisabled
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testOnePartition
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testWriteException
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testWriteNormal
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testWriteNormalWithRemoteMerge
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testWriteNormalWithRemoteMergeAndCombine
org.apache.hadoop.mapred.SortWriteBufferTest ‑ testReadWrite
org.apache.hadoop.mapred.SortWriteBufferTest ‑ testSortBufferIterator
org.apache.hadoop.mapreduce.RssMRUtilsTest ‑ applyDynamicClientConfTest
…
org.apache.uniffle.common.util.RssUtilsTest ‑ testCheckProcessedBlockIds

@jerqi
Copy link
Contributor

jerqi commented Nov 19, 2025

If we disable the multi replicas of shuffle-data, we should strict check the processed blockIds. WDYT? @jerqi

Does it cover the speculation execution and AQE?

@zuston
Copy link
Member Author

zuston commented Nov 19, 2025

If we disable the multi replicas of shuffle-data, we should strict check the processed blockIds. WDYT? @jerqi

Does it cover the speculation execution and AQE?

I haven't take a deep think for this case. Let's froze

@jerqi jerqi requested a review from Copilot December 12, 2025 02:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces stricter validation for processed block IDs by ensuring that the processed blocks exactly match the expected blocks, rather than allowing processed blocks to be a superset of expected blocks. The change prevents scenarios where extra, unexpected blocks are processed without detection.

Key Changes:

  • Modified the checkProcessedBlockIds method to detect and reject extra processed blocks that aren't in the expected set
  • Removed an overloaded version of checkProcessedBlockIds that allowed processed blocks to be a superset of expected blocks
  • Added comprehensive unit tests covering both valid and invalid scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
common/src/main/java/org/apache/uniffle/common/util/RssUtils.java Updated validation logic to use destructive checking and removed the permissive overloaded method
common/src/test/java/org/apache/uniffle/common/util/RssUtilsTest.java Added test cases for the stricter block ID validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

expectedCount++;
if (processedBlockIds.contains(it.next())) {
actualCount++;
if (processedBlockIds.remove(it.next())) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method now mutates the input parameter processedBlockIds by calling remove(). This is a breaking behavioral change that could affect callers who reuse this Set. Consider either documenting this side effect clearly, cloning the Set before modification, or renaming the parameter to indicate it will be modified (e.g., mutableProcessedBlockIds).

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +77
RssUtils.checkProcessedBlockIds(exceptedBlockIds, processedBlockIds);
fail();
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should verify the exception message or type, not just that any exception is thrown. Consider using assertThrows with a specific exception type check and optionally validating the error message contains expected text like 'illegal' and '1 blocks'.

Copilot uses AI. Check for mistakes.

@Test
public void testCheckProcessedBlockIds() throws Exception {
Roaring64NavigableMap exceptedBlockIds = new Roaring64NavigableMap();
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'excepted' to 'expected'.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants