[tests, cuda] test: strengthen VRAM target and release assertions for keep sessions#64
[tests, cuda] test: strengthen VRAM target and release assertions for keep sessions#64Wangmerlyn merged 4 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughReplaces fixed sleeps in CUDA controller tests with dynamic, timeout-bounded polling; adds a reusable wait_until helper; introduces pre-checks for available VRAM and delta-based monitoring of memory_allocated/memory_reserved during keep/release scenarios; adds detailed assertions and error messages. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Wangmerlyn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly strengthens the testing of CUDA GPU VRAM management. It refactors an existing test to include more precise assertions for VRAM allocation and deallocation during context manager usage and introduces a new dedicated test to verify that the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9692d7111e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| alloc_tolerance = 8 * 1024 * 1024 | ||
| reserve_tolerance = 16 * 1024 * 1024 | ||
|
|
||
| ctrl.keep() |
There was a problem hiding this comment.
Always release controller when post-keep assertions fail
test_cuda_controller_respects_vram_target_during_keep starts the background keep loop with ctrl.keep(), but cleanup is only done at the end of the happy path. If any assertion before ctrl.release() fails (for example the 3s target wait timing out on a busy CI GPU), the daemon thread keeps running and continues holding VRAM, which can cascade into unrelated CUDA test failures in the same pytest process. Use try/finally (or with ctrl:) so release is guaranteed on failure paths.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request significantly strengthens the tests for VRAM management by adding detailed assertions for memory allocation and release, which is a great improvement. My review includes two suggestions to enhance the consistency and reduce code duplication within the newly added test logic. One suggestion is to refactor the duplicated polling logic into a shared helper function, and the other is to make the wait conditions across the tests more consistent and robust.
| while time.time() < deadline: | ||
| alloc_delta = max( | ||
| 0, torch.cuda.memory_allocated(ctrl.rank) - before_allocated | ||
| ) | ||
| reserved_delta = max( | ||
| 0, torch.cuda.memory_reserved(ctrl.rank) - before_reserved | ||
| ) | ||
| peak_alloc_delta = max(peak_alloc_delta, alloc_delta) | ||
| peak_reserved_delta = max(peak_reserved_delta, reserved_delta) | ||
|
|
||
| # allocated should track payload; reserved may be larger due allocator blocks | ||
| if ( | ||
| alloc_delta >= int(target_bytes * 0.95) | ||
| and reserved_delta >= alloc_delta | ||
| ): | ||
| reached_target = True | ||
| break | ||
| time.sleep(0.05) |
There was a problem hiding this comment.
This polling loop is very similar to the release-checking loop below (lines 69-81). Furthermore, in test_keep_and_release.py, you've introduced a _wait_until helper function which abstracts this exact polling pattern.
To improve consistency and reduce code duplication across the test suite, consider refactoring this test to also use a polling helper. You could move _wait_until to a shared test utility file and import it in both test files.
This would make the test logic cleaner and easier to maintain.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@tests/cuda_controller/test_context_manager.py`:
- Around line 67-85: Initialize alloc_delta_after and reserved_delta_after to
safe default values before entering the while loop so they are always defined
for the final assertion; update the test around release_deadline / the while
loop that reads torch.cuda.memory_allocated(ctrl.rank) and
torch.cuda.memory_reserved(ctrl.rank) to set alloc_delta_after and
reserved_delta_after (e.g., 0) prior to the loop, then let the loop update them
as it currently does and keep the final assert that references alloc_delta_after
and reserved_delta_after unchanged.
🧹 Nitpick comments (4)
tests/cuda_controller/test_keep_and_release.py (3)
8-14: Clean polling helper – consider usingtime.monotonic().
time.time()can jump on NTP adjustments;time.monotonic()is the idiomatic choice for measuring elapsed durations. Unlikely to matter at 3 s scale, but worth aligning with best practice.♻️ Suggested tweak
def _wait_until(predicate, timeout_s: float = 3.0, interval_s: float = 0.05) -> bool: - deadline = time.time() + timeout_s - while time.time() < deadline: + deadline = time.monotonic() + timeout_s + while time.monotonic() < deadline: if predicate(): return True time.sleep(interval_s) return False
47-50: Consider adding alarge_memorymarker or documenting the expected footprint.The coding guidelines mention a
large_memorypytest marker for opt-in heavy VRAM tests. At 32 MB this test is lightweight, but if the project convention is to mark any test that allocates real VRAM, it may be worth adding. A brief comment documenting the expected allocation (~32 MB) would also help future maintainers gauge CI requirements.As per coding guidelines: "Respect existing pytest markers:
rocmfor ROCm-only tests,large_memoryfor opt-in heavy VRAM tests."
85-88: Redundant re-assertion after_wait_untilalready confirmed the condition.Lines 87–88 re-check the same
>= 0.95 * target_bytescondition that_wait_untilalready verified (andreachedwas assertedTrueon line 83). The allocation could theoretically decrease between the poll and these lines if the keep-loop is cycling, making this a slightly stricter "still true right now" check. If that's intentional, a short comment would clarify the intent; otherwise these lines are just noise.tests/cuda_controller/test_context_manager.py (1)
39-65: Consider extracting the keep-phase polling into a shared helper.This inline polling loop duplicates the spirit of
_wait_untilfromtest_keep_and_release.pybut adds peak-delta tracking. A small shared helper (e.g., in atests/cuda_controller/conftest.pyor a_helpers.pymodule) accepting an optionalon_samplecallback could serve both files, reducing copy-paste drift as tolerances or timeouts evolve.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary by CodeRabbit