Skip to content

[tests, cuda] test: strengthen VRAM target and release assertions for keep sessions#64

Merged
Wangmerlyn merged 4 commits intomainfrom
tests/vram-assertions
Feb 15, 2026
Merged

[tests, cuda] test: strengthen VRAM target and release assertions for keep sessions#64
Wangmerlyn merged 4 commits intomainfrom
tests/vram-assertions

Conversation

@Wangmerlyn
Copy link
Owner

@Wangmerlyn Wangmerlyn commented Feb 15, 2026

Summary by CodeRabbit

  • Tests
    • Enhanced CUDA context manager tests to use dynamic polling for VRAM targets instead of fixed sleeps, improving reliability.
    • Added a new test verifying allocation toward a VRAM target and proper release back to baseline with timeout-based checks.
    • Introduced a reusable polling utility for wait-until semantics.
    • Improved diagnostics, tolerances, and descriptive failure messages for memory checks.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 15, 2026

📝 Walkthrough

Walkthrough

Replaces fixed sleeps in CUDA controller tests with dynamic, timeout-bounded polling; adds a reusable wait_until helper; introduces pre-checks for available VRAM and delta-based monitoring of memory_allocated/memory_reserved during keep/release scenarios; adds detailed assertions and error messages.

Changes

Cohort / File(s) Summary
Context manager test
tests/cuda_controller/test_context_manager.py
Increases target VRAM, calls torch.cuda.empty_cache()/torch.cuda.synchronize(), adds pre-checks for free VRAM, replaces fixed sleep with delta-based polling loops that monitor memory_allocated and memory_reserved, records before/after values, asserts attainment and release within tolerances, and adds descriptive error messages and a docstring.
Keep/release tests
tests/cuda_controller/test_keep_and_release.py
Adds new test test_cuda_controller_respects_vram_target_during_keep that uses polling to verify allocation toward a VRAM target during keep() and release back to baseline; adjusts imports.
Test utility
tests/polling.py
Adds wait_until(predicate, timeout_s=3.0, interval_s=0.05) helper used by tests to poll a boolean predicate until success or timeout.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I nudge the GPU, I count each byte,

I wait, I watch, till usage looks right.
No lazy sleeps — I hop and I poll,
Memory rises, then falls to its hole.
Hooray for tests that keep things tight!

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: strengthening VRAM target and release assertions for keep sessions in the test suite.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tests/vram-assertions

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
tests/polling.py (1)

1-16: LGTM — clean reusable polling helper.

Tiny nit: time.monotonic() is preferred over time.time() for measuring elapsed durations, as it's immune to system clock adjustments. Not critical for tests, but it's a good habit.

♻️ Use monotonic clock
-    deadline = time.time() + timeout_s
-    while time.time() < deadline:
+    deadline = time.monotonic() + timeout_s
+    while time.monotonic() < deadline:

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Wangmerlyn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly strengthens the testing of CUDA GPU VRAM management. It refactors an existing test to include more precise assertions for VRAM allocation and deallocation during context manager usage and introduces a new dedicated test to verify that the keep() and release() methods of the CudaGPUController correctly manage VRAM targets. These changes improve the reliability of VRAM management in the CudaGPUController by ensuring that VRAM is accurately reserved and freed.

Highlights

  • Enhanced VRAM Target Assertions: The test_cuda_controller_context_manager has been significantly refactored to include more rigorous and dynamic assertions for VRAM allocation and deallocation, ensuring that the controller accurately manages VRAM during its lifecycle.
  • New VRAM Keep and Release Test: A new test, test_cuda_controller_respects_vram_target_during_keep, was introduced to specifically validate that the keep() and release() methods of the CudaGPUController correctly consume and free VRAM according to the specified targets.
  • Asynchronous Test Utility: A new helper function, _wait_until, was added to facilitate robust asynchronous checks within the tests, allowing for more reliable assertions on VRAM state changes over time.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • tests/cuda_controller/test_context_manager.py
    • Refactored test_cuda_controller_context_manager to include detailed assertions for VRAM allocation and release.
    • Increased vram_to_keep from '8MB' to '64MB' for a more robust test.
    • Added checks for insufficient free VRAM and dynamic waiting for VRAM targets to be met and released.
  • tests/cuda_controller/test_keep_and_release.py
    • Added a new helper function _wait_until to facilitate asynchronous checks in tests.
    • Introduced test_cuda_controller_respects_vram_target_during_keep to specifically test the keep() and release() methods' adherence to VRAM targets.
Activity
  • Initial commit of the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9692d7111e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

alloc_tolerance = 8 * 1024 * 1024
reserve_tolerance = 16 * 1024 * 1024

ctrl.keep()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Always release controller when post-keep assertions fail

test_cuda_controller_respects_vram_target_during_keep starts the background keep loop with ctrl.keep(), but cleanup is only done at the end of the happy path. If any assertion before ctrl.release() fails (for example the 3s target wait timing out on a busy CI GPU), the daemon thread keeps running and continues holding VRAM, which can cascade into unrelated CUDA test failures in the same pytest process. Use try/finally (or with ctrl:) so release is guaranteed on failure paths.

Useful? React with 👍 / 👎.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly strengthens the tests for VRAM management by adding detailed assertions for memory allocation and release, which is a great improvement. My review includes two suggestions to enhance the consistency and reduce code duplication within the newly added test logic. One suggestion is to refactor the duplicated polling logic into a shared helper function, and the other is to make the wait conditions across the tests more consistent and robust.

Comment on lines 43 to 60
while time.time() < deadline:
alloc_delta = max(
0, torch.cuda.memory_allocated(ctrl.rank) - before_allocated
)
reserved_delta = max(
0, torch.cuda.memory_reserved(ctrl.rank) - before_reserved
)
peak_alloc_delta = max(peak_alloc_delta, alloc_delta)
peak_reserved_delta = max(peak_reserved_delta, reserved_delta)

# allocated should track payload; reserved may be larger due allocator blocks
if (
alloc_delta >= int(target_bytes * 0.95)
and reserved_delta >= alloc_delta
):
reached_target = True
break
time.sleep(0.05)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This polling loop is very similar to the release-checking loop below (lines 69-81). Furthermore, in test_keep_and_release.py, you've introduced a _wait_until helper function which abstracts this exact polling pattern.

To improve consistency and reduce code duplication across the test suite, consider refactoring this test to also use a polling helper. You could move _wait_until to a shared test utility file and import it in both test files.

This would make the test logic cleaner and easier to maintain.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/cuda_controller/test_context_manager.py`:
- Around line 67-85: Initialize alloc_delta_after and reserved_delta_after to
safe default values before entering the while loop so they are always defined
for the final assertion; update the test around release_deadline / the while
loop that reads torch.cuda.memory_allocated(ctrl.rank) and
torch.cuda.memory_reserved(ctrl.rank) to set alloc_delta_after and
reserved_delta_after (e.g., 0) prior to the loop, then let the loop update them
as it currently does and keep the final assert that references alloc_delta_after
and reserved_delta_after unchanged.
🧹 Nitpick comments (4)
tests/cuda_controller/test_keep_and_release.py (3)

8-14: Clean polling helper – consider using time.monotonic().

time.time() can jump on NTP adjustments; time.monotonic() is the idiomatic choice for measuring elapsed durations. Unlikely to matter at 3 s scale, but worth aligning with best practice.

♻️ Suggested tweak
 def _wait_until(predicate, timeout_s: float = 3.0, interval_s: float = 0.05) -> bool:
-    deadline = time.time() + timeout_s
-    while time.time() < deadline:
+    deadline = time.monotonic() + timeout_s
+    while time.monotonic() < deadline:
         if predicate():
             return True
         time.sleep(interval_s)
     return False

47-50: Consider adding a large_memory marker or documenting the expected footprint.

The coding guidelines mention a large_memory pytest marker for opt-in heavy VRAM tests. At 32 MB this test is lightweight, but if the project convention is to mark any test that allocates real VRAM, it may be worth adding. A brief comment documenting the expected allocation (~32 MB) would also help future maintainers gauge CI requirements.

As per coding guidelines: "Respect existing pytest markers: rocm for ROCm-only tests, large_memory for opt-in heavy VRAM tests."


85-88: Redundant re-assertion after _wait_until already confirmed the condition.

Lines 87–88 re-check the same >= 0.95 * target_bytes condition that _wait_until already verified (and reached was asserted True on line 83). The allocation could theoretically decrease between the poll and these lines if the keep-loop is cycling, making this a slightly stricter "still true right now" check. If that's intentional, a short comment would clarify the intent; otherwise these lines are just noise.

tests/cuda_controller/test_context_manager.py (1)

39-65: Consider extracting the keep-phase polling into a shared helper.

This inline polling loop duplicates the spirit of _wait_until from test_keep_and_release.py but adds peak-delta tracking. A small shared helper (e.g., in a tests/cuda_controller/conftest.py or a _helpers.py module) accepting an optional on_sample callback could serve both files, reducing copy-paste drift as tolerances or timeouts evolve.

Wangmerlyn and others added 3 commits February 15, 2026 22:07
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Wangmerlyn Wangmerlyn merged commit 6a39098 into main Feb 15, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant