fix: address multiple correctness and safety bugs#1
Open
Ridwannurudeen wants to merge 1 commit intoPluralisResearch:mainfrom
Open
fix: address multiple correctness and safety bugs#1Ridwannurudeen wants to merge 1 commit intoPluralisResearch:mainfrom
Ridwannurudeen wants to merge 1 commit intoPluralisResearch:mainfrom
Conversation
- Fix AttributeError in load_model(): self.models does not exist, changed to self.model
- Fix argparse type=bool which silently breaks CLI flags (bool("False") is True);
replaced with a proper string-to-bool converter in both trainer scripts
- Guard wandb.finish() to only run when wandb was actually initialized,
preventing crash when wandb_project is not set
- Add weights_only parameter to torch.load calls to resolve PyTorch
deprecation warnings and improve checkpoint loading safety
- Replace list with collections.deque in threadsafe_queue.py for O(1)
popleft instead of O(n) list.pop(0) in the pipeline communication hot path
- Fix duplicate wandb entry in requirements.txt and add missing pyarrow dependency
- Add missing __init__.py files for asyncpp/optim/, asyncpp/runtime/,
and examples/models/ to ensure proper Python package resolution
ca6e6b2 to
b83c1c5
Compare
Author
|
Hi! Friendly ping on this PR — fixes multiple correctness and safety bugs identified during a code review. Happy to walk through the changes or make adjustments if needed. Let me know if this repo is still actively maintained. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AttributeErrorinload_model():self.modelsdoes not exist onDilocoSetup, changed toself.modelwhich is the actual attributeargparseboolean flags:type=boolsilently breaks CLI —bool("False")evaluates toTruein Python. Replaced with a proper string-to-bool converter across both trainer scripts (affects--cosine_anneal,--deterministic,--adaptive_momentum,--sparta_nesterov,--sparta_adaptive_momentum,--buffer_to_cpu)wandb.finish()crash:_cleanup()calledwandb.finish()unconditionally on rank 0, even when wandb was never initialized (nowandb_projectset). Now checksself.config.wandb_projectfirstweights_onlytotorch.loadcalls: Resolves PyTorch >=2.6 deprecation warnings and improves checkpoint loading safetycollections.dequeinthreadsafe_queue.py: Replaceslist.pop(0)(O(n)) withdeque.popleft()(O(1)) in the pipeline communication hot pathwandbinrequirements.txtand add missingpyarrowdependency (imported indata_utils.pybut not listed)__init__.pyfiles forasyncpp/optim/,asyncpp/runtime/, andexamples/models/to ensure proper Python package resolutionMotivation
Found these issues while reading through the codebase after the paper release. The
load_modelbug and argparse boolean bug are correctness issues that would cause runtime failures. The remaining fixes improve safety, performance, and packaging hygiene.Test plan
load_modelreferences the correct attribute (self.model)--deterministic Falsenow correctly evaluates toFalse_cleanup()no longer crashes whenwandb_projectisNonedeque.popleft()maintains the same FIFO semantics aslist.pop(0)pyarrowis imported inexamples/data_utils.py(lines 5-6)