Skip to content

Comments

fix: race conditions and bugs in Simple/Normal queue modules#7

Closed
Koan-Bot wants to merge 7 commits intocern-mig:masterfrom
atoomic:koan.atoomic/fix-race-conditions
Closed

fix: race conditions and bugs in Simple/Normal queue modules#7
Koan-Bot wants to merge 7 commits intocern-mig:masterfrom
atoomic:koan.atoomic/fix-race-conditions

Conversation

@Koan-Bot
Copy link

Summary

  • Simple.pm purge() maxlock default bug: maxlock was defaulting to $self->{maxtemp} instead of $self->{maxlock} (line 309). This caused stale locks to be cleaned up using the wrong timeout — too aggressive when maxtemp < maxlock, or too lenient in the opposite case.

  • Simple.pm remove() missing lock check: Unlike Normal.pm which validates _is_locked() before removing, Simple.pm would blindly unlink both files. This allowed callers to remove unlocked elements, risking data loss if another process was working with the element.

  • Normal.pm remove() infinite loop: The inner while(1) loop that cleans up the locked subdirectory during removal had no retry limit. Under persistent lock contention (another process repeatedly re-locking the obsolete element), this would spin forever.

Additional Analysis (Not Fixed — By Design)

These race conditions were analyzed and determined to be handled correctly or inherent to the design:

  • Queue.pm _special_mkdir umask race: Global umask is inherently process-wide. Documented as a known limitation of POSIX umask semantics.
  • Normal.pm _is_locked() TOCTOU: The stat-based age check is correctly documented as "best effort" and handles ENOENT gracefully.
  • Simple.pm _purge_dir TOCTOU between stat/unlink: Stale lock cleanup is inherently racy — purge timeouts are meant to be generous enough to avoid conflicts.

Test plan

  • 20 new tests in t/1race-conditions.t
  • Full test suite passes (433 tests)
  • Tests cover: maxlock default behavior, lock existence check, explicit maxlock override, maxlock=0 disabling, Normal.pm remove correctness

🤖 Generated with Kōan (Claude Opus 4.6)

Nicolas Rochelemagne and others added 7 commits July 11, 2025 17:02
Fixup pod example
- Rename testsuite.yml → ci.yml with clearer job names
- Add concurrency group to cancel stale runs on same ref
- Add macOS job (Perl 5.38 + latest) via actions-setup-perl
- Add coverage job with Devel::Cover + Coveralls
- Clean env vars: RELEASE_TESTING only on author-test job
- Rename matrix job from 'perl' to 'linux' for clarity

Preserves existing perl-versions matrix (5.10+ with devel).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Systematic coverage improvement across all modules:
- Queue.pm base: constructor validation errors, _require failure, copy independence
- Normal.pm: schema validation (17 error cases), non-permissive lock/unlock, purge
  with temp/lock/obsolete dirs, nlink option, optional fields, binary/string/table
  data types, hash serialization escaping
- Simple.pm: add_ref, add_path, get_path, touch, granularity option, non-permissive
  lock/unlock, purge with maxlock, error paths
- Null.pm: all unsupported methods (get, get_ref, get_path, touch, lock, unlock,
  remove), add_ref, add_path file removal
- Set.pm: duplicate queue, non-DQ object, missing queue removal, mixed queue types

237 → 413 total tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
test: add 176 new tests covering error paths and edge cases
- Simple.pm purge(): fix maxlock default using maxtemp instead of maxlock
  (copy-paste bug causing stale locks to be cleaned up too aggressively)
- Simple.pm remove(): add lock existence check before unlink to prevent
  removing unlocked elements (matching Normal.pm behavior)
- Normal.pm remove(): add retry limit (10) to prevent infinite loop when
  lock contention persists during element removal

Includes 20 new tests covering all fixes and edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Koan-Bot
Copy link
Author

Note: the purge() maxlock fix here supersedes the same fix in #2 (which adds more context but covers only this single bug). This PR adds two additional fixes: Simple.pm remove() lock check and Normal.pm remove() retry limit.

@Koan-Bot
Copy link
Author

Superseded by #17 which contains only the relevant changes (this PR was polluted with unrelated fork infrastructure commits).

@Koan-Bot
Copy link
Author

Superseded by #17 (clean branch from upstream/master, no fork infrastructure noise).

@Koan-Bot Koan-Bot closed this Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants