feat(scrutiny): collector cron scheduling, smartctl error notifications, and override improvements#347
Conversation
… (#340) ## Summary - Remove `strings.ToLower` from scanned and config-override device file paths in `TransformDetectedDevices` - `/dev/disk/by-id/` paths containing uppercase manufacturer names and serial numbers are now passed verbatim to `smartctl` - Config override map lookups use `strings.EqualFold` so comparisons remain case-insensitive without mutating stored paths - WWN normalization to lowercase is unchanged - Add `groupedDeviceKey` helper for case-insensitive map key lookup - Add two tests covering uppercase by-id paths via config override and via scan ## Linked Issues Closes #335 ## Test plan - [x] `TestDetect_TransformDetectedDevices_UppercaseByIDPath` — config-only uppercase path preserved verbatim - [x] `TestDetect_TransformDetectedDevices_UppercaseByIDPathScanned` — scanned uppercase path preserved verbatim - [x] All existing detect package tests pass (`go test ./collector/pkg/detect/...`)
) - Add notify_on_collector_error bool setting (default true) with DB migration m20260410000000 - Add NotifyFailureTypeCollectorError constant and NewCollectorError() constructor to notify package - Add POST /api/device/:id/collector-error and POST /api/collector/scan-error endpoints - Collector reports smartctl --scan failures via scan-error endpoint and --xall fatal errors via device collector-error endpoint - Missing WWN after info phase also reported as scan-level error - Notifications respect mute, quiet hours, and rate limiting via existing NotificationGate - Frontend: add notify_on_collector_error to AppConfig interface, defaults, settings component, and settings dialog toggle
…ions - fix(collector): report fatal xall exit codes (0x01, 0x02) to backend instead of silently continuing with unusable smartctl output - fix(handler): extract sendNotificationViaGate helper to eliminate duplicate notification dispatch blocks in both error handlers - fix(handler): accept optional device_name in scan-error payload so the notification subject shows the device name instead of "(unknown device)" - fix(collector): pass device name through ReportScanError for info errors - fix(models): document why bool-default-true fields cannot be set in ApplyDefaults - test(handler): add handler tests covering bad JSON, notify disabled, device not found, and muted device paths
- CLAUDE.md (gitignored, local only): added 5 missing API endpoints (mqtt-sync, reset-status, missed-ping-timeout, collector-error, scan-error) - README.md: add Collector-Side Error Notifications to Extended Features - docs/DEPENDENCIES.md: update date, correct 6 Go package versions, add 6 new packages (paho.mqtt, fpdf, mapstructure/v2, jwt/v5, uuid, automaxprocs), mark Phase 2 updates complete - docs/TROUBLESHOOTING_NOTIFICATIONS.md: add collector error notifications section documenting per-device and scan-level error reporting (#334) - Deleted OVERRIDE_FIX_PROGRESS.md (was untracked, contained credentials)
## Summary - Embeds `robfig/cron/v3` scheduler directly in the collector binary, removing the dependency on an external `crond` daemon - Adds `--cron-schedule`, `--run-startup`, and `--run-startup-sleep` CLI flags - Adds `cron.schedule`, `cron.run_on_startup`, and `cron.startup_sleep_secs` YAML config keys - Supports `COLLECTOR_CRON_SCHEDULE`, `COLLECTOR_RUN_STARTUP`, and `COLLECTOR_RUN_STARTUP_SLEEP` env vars - Without a schedule configured, single-run behavior is completely unchanged - Blocks on SIGINT/SIGTERM and drains the scheduler gracefully on shutdown - Documents the new config block in `example.collector.yaml` ## Linked Issues Closes #333 ## Test plan - [x] `go build ./collector/cmd/collector-metrics/` passes - [x] `go vet ./collector/...` passes - [x] `go mod tidy` run — `robfig/cron/v3` correctly classified as direct dependency - [ ] Manual: run collector with `--cron-schedule "* * * * *"` and verify recurring collection - [ ] Manual: run collector without `--cron-schedule` and verify single-run exit behavior is unchanged - [ ] Manual: verify `COLLECTOR_CRON_SCHEDULE` env var is picked up correctly
## Summary - Add DB migration (`m20260411000000`) to enforce a unique constraint on `(protocol, attribute_id, wwn)` in `attribute_overrides`, deduplicating any existing rows first - Update `AttributeOverride` model GORM tags to use `uniqueIndex` to match the new constraint - Add `GetAllOverridesForDisplay` repository method that merges UI overrides (DB) with config-file overrides, so the settings panel shows everything currently active — config overrides appear as read-only (source: `config`, ID: 0) - Add corresponding mock method for `GetAllOverridesForDisplay` - Refactor `SaveAttributeOverride` to extract validation helpers (`validateAttributeOverride`, `validateForceStatus`, `validateThresholds`) to keep cognitive complexity in bounds - Add input validation: at least one threshold required in custom mode, `warn_above < fail_above`, non-negative values, and WWN hex-format check for all action types - Fix misleading comment in `applier.go` that implied thresholds can be combined with `force_status` — they are mutually exclusive at the call site - Default new override form action to custom threshold (empty string) so threshold fields are visible on first open instead of `ignore` - Add 18 handler unit tests for `GetAttributeOverrides` and `SaveAttributeOverride` (mock-based, no InfluxDB required) ## Linked Issues Closes #345 ## Test plan - [x] `go build ./...` passes cleanly - [x] All 18 new handler tests pass (`go test -v -run TestAttributeOverride ./webapp/backend/pkg/web/handler/...`) - [x] All overrides package unit tests pass (`go test -v ./webapp/backend/pkg/overrides/...`) - [x] Migration ordering verified (m20260410 before m20260411 in slice) - [x] WWN validation tested for all three action types - [x] Threshold edge cases verified (`warn >= fail` correctly rejected)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/dev/disk/by-idpaths to prevent duplicate device registration(protocol, attribute_id, wwn)Linked Issues
Closes #345
Test plan