fix: address multiple correctness and safety bugs#3
Open
Ridwannurudeen wants to merge 1 commit intoPluralisResearch:mainfrom
Open
fix: address multiple correctness and safety bugs#3Ridwannurudeen wants to merge 1 commit intoPluralisResearch:mainfrom
Ridwannurudeen wants to merge 1 commit intoPluralisResearch:mainfrom
Conversation
- Remove undefined $dd variable in run.bash that causes gpipe baseline to misparse command-line arguments - Close file handles properly when loading JSON config (use with-statement instead of bare open() in both main_with_runtime.py and sync_main.py) - Add weights_only parameter to all torch.load calls to resolve PyTorch >=2.6 deprecation warnings - Replace list with collections.deque in threadsafe_queue.py for O(1) popleft instead of O(n) list.pop(0) in the pipeline communication path - Add missing __init__.py files for optim/ and runtime/ to ensure proper Python package resolution - Fix argparse description from "PyTorch ImageNet Training" to match the actual language model training use case
Author
|
Hi! Friendly ping on this PR — fixes multiple correctness and safety bugs identified during a code review. Happy to walk through the changes or make adjustments if needed. Let me know if this repo is still actively maintained. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
$ddvariable inrun.bash: The gpipe baseline command (line 30) references$ddwhich is never defined, causing the sync baseline to misparse arguments. Removed the stray variable to match the async command structure.json.load(open(...))in bothmain_with_runtime.pyandsync_main.pyleaks file descriptors. Replaced with properwithstatements.weights_onlytotorch.loadcalls: Resolves PyTorch >=2.6FutureWarningdeprecation acrossmain_with_runtime.py,sync_main.py, andoptim/optimizer.py.collections.dequeinthreadsafe_queue.py: Replaceslist.pop(0)(O(n)) withdeque.popleft()(O(1)) in the inter-stage communication path.__init__.pyfiles:optim/andruntime/lack__init__.py, relying on implicit namespace packages. Added explicit init files for robust package resolution.Motivation
Found these while reading through the codebase after the ICML 2025 paper. The
$ddbug silently breaks the gpipe baseline script, and thetorch.loaddeprecation will become an error in future PyTorch versions.Test plan
$ddis not defined anywhere inrun.bash— confirmed stray variable$dd, so removing it from the sync basecmdstr is consistentdeque.popleft()maintains identical FIFO semantics tolist.pop(0)optim/andruntime/are imported viasys.path.append— adding__init__.pymakes this more robust without breaking existing imports