fix: address multiple correctness and safety bugs by Ridwannurudeen · Pull Request #1 · PluralisResearch/AsyncMesh

Ridwannurudeen · 2026-02-07T20:45:07Z

Summary

Fix AttributeError in load_model(): self.models does not exist on DilocoSetup, changed to self.model which is the actual attribute
Fix broken argparse boolean flags: type=bool silently breaks CLI — bool("False") evaluates to True in Python. Replaced with a proper string-to-bool converter across both trainer scripts (affects --cosine_anneal, --deterministic, --adaptive_momentum, --sparta_nesterov, --sparta_adaptive_momentum, --buffer_to_cpu)
Guard wandb.finish() crash: _cleanup() called wandb.finish() unconditionally on rank 0, even when wandb was never initialized (no wandb_project set). Now checks self.config.wandb_project first
Add weights_only to torch.load calls: Resolves PyTorch >=2.6 deprecation warnings and improves checkpoint loading safety
Use collections.deque in threadsafe_queue.py: Replaces list.pop(0) (O(n)) with deque.popleft() (O(1)) in the pipeline communication hot path
Fix duplicate wandb in requirements.txt and add missing pyarrow dependency (imported in data_utils.py but not listed)
Add missing __init__.py files for asyncpp/optim/, asyncpp/runtime/, and examples/models/ to ensure proper Python package resolution

Motivation

Found these issues while reading through the codebase after the paper release. The load_model bug and argparse boolean bug are correctness issues that would cause runtime failures. The remaining fixes improve safety, performance, and packaging hygiene.

Test plan

Verified load_model references the correct attribute (self.model)
Verified argparse boolean parsing: --deterministic False now correctly evaluates to False
Verified _cleanup() no longer crashes when wandb_project is None
Confirmed deque.popleft() maintains the same FIFO semantics as list.pop(0)
Confirmed pyarrow is imported in examples/data_utils.py (lines 5-6)

- Fix AttributeError in load_model(): self.models does not exist, changed to self.model - Fix argparse type=bool which silently breaks CLI flags (bool("False") is True); replaced with a proper string-to-bool converter in both trainer scripts - Guard wandb.finish() to only run when wandb was actually initialized, preventing crash when wandb_project is not set - Add weights_only parameter to torch.load calls to resolve PyTorch deprecation warnings and improve checkpoint loading safety - Replace list with collections.deque in threadsafe_queue.py for O(1) popleft instead of O(n) list.pop(0) in the pipeline communication hot path - Fix duplicate wandb entry in requirements.txt and add missing pyarrow dependency - Add missing __init__.py files for asyncpp/optim/, asyncpp/runtime/, and examples/models/ to ensure proper Python package resolution

Ridwannurudeen · 2026-03-02T08:26:12Z

Hi! Friendly ping on this PR — fixes multiple correctness and safety bugs identified during a code review. Happy to walk through the changes or make adjustments if needed. Let me know if this repo is still actively maintained.

Ridwannurudeen force-pushed the fix/correctness-and-safety-bugs branch from ca6e6b2 to b83c1c5 Compare February 9, 2026 18:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: address multiple correctness and safety bugs#1

fix: address multiple correctness and safety bugs#1
Ridwannurudeen wants to merge 1 commit intoPluralisResearch:mainfrom
Ridwannurudeen:fix/correctness-and-safety-bugs

Ridwannurudeen commented Feb 7, 2026 •

edited

Loading

Uh oh!

Ridwannurudeen commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ridwannurudeen commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

Ridwannurudeen commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ridwannurudeen commented Feb 7, 2026 •

edited

Loading