Fix GatherND division by zero when batch dimensions mismatch#27090
Fix GatherND division by zero when batch dimensions mismatch#27090chaya2350 wants to merge 16 commits intomicrosoft:mainfrom
Conversation
Fixes microsoft#23828 Added validation to check: - num_batches is not zero - num_slices is divisible by num_batches Before this fix, mismatched batch dimensions caused a crash due to division by zero.
|
@fdwr @justinchuby Could you please review this PR? It fixes the issue you reported. Thanks! 🙂 |
There was a problem hiding this comment.
Pull request overview
This PR fixes a division by zero crash in the GatherND operator when batch dimensions between input and indices tensors are mismatched. The fix adds validation to ensure num_batches is non-zero and that num_slices is evenly divisible by num_batches before performing the division.
Changes:
- Added validation checks in
gather_nd.ccto prevent division by zero - Added a unit test to verify the fix handles batch dimension mismatches correctly
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| onnxruntime/core/providers/cpu/tensor/gather_nd.cc | Added validation to check num_batches is not zero and num_slices is divisible by num_batches before division |
| onnxruntime/test/providers/cpu/tensor/gather_nd_op_test.cc | Added test case to verify batch dimension mismatch returns an error instead of crashing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
I've addressed the Copilot comment. Ready for re-review! 🙂 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review: added test case to cover the num_batches == 0 validation path.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Head branch was pushed to by a user without write access
Head branch was pushed to by a user without write access
Head branch was pushed to by a user without write access
|
Why was this closed? |
|
@justinchuby - Weird, the branch was deleted too. |
|
This was a mistake... |
|
@chaya2350 could you run |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@chaya2350, please format the code with LGTM. The changes are safe and improve the robustness of the |
|
Looks like there are some errors with CUDA. @tianleiwu do you have suggestions? |
|
Hi,
|
|
Based on the team's suggestions, could you take a look at and ensure the error strings match with the cpu provider? |
|
@chaya2350, test can explicitly run on CPU provider only like see onnxruntime/onnxruntime/test/unittest_util/base_tester.h Lines 576 to 581 in 735e69a |
Fixes #23828
Added validation to check:
Before this fix, mismatched batch dimensions caused a crash due to division by zero.
Description
This PR fixes a division by zero crash in the GatherND operator when batch dimensions mismatch between input and indices tensors.
Changes made:
Added validation in gather_nd.cc to check that num_batches is not zero before division
Added validation that num_slices is divisible by num_batches
Added a unit test to verify the fix
Motivation and Context
Description
Fixes #23828
When batch_dims is set but the actual batch dimensions of the input tensor and indices tensor don't align correctly, the code performs a division that can result in division by zero, causing a crash.
For example, with:
Input shape: [2, 2, 2]
Indices shape: [2, 1]
batch_dims=1
The calculation num_slices / num_batches would crash if num_batches is 0, or produce unexpected results if they don't divide evenly.
This fix returns a clear error message instead of crashing.