Skip to content

feat(scrutiny): collector cron scheduling, smartctl error notifications, and override improvements#347

Merged
Starosdev merged 10 commits intomasterfrom
develop
Mar 11, 2026
Merged

feat(scrutiny): collector cron scheduling, smartctl error notifications, and override improvements#347
Starosdev merged 10 commits intomasterfrom
develop

Conversation

@Starosdev
Copy link
Owner

Summary

  • Add built-in cron scheduling to the collector so drives can be scanned on a configurable schedule without an external cron daemon
  • Add notifications for collector-side smartctl errors so users are alerted when the collector cannot read a drive
  • Fix device path case preservation for /dev/disk/by-id paths to prevent duplicate device registration
  • Show config-file attribute overrides alongside UI overrides in the settings panel; tighten input validation (WWN format, threshold ordering, non-negative values); enforce unique DB constraint on (protocol, attribute_id, wwn)
  • Update project documentation

Linked Issues

Closes #345

Test plan

  • All CI checks passed on develop
  • Verified on dev environment (http://192.168.1.33:8680)
  • Backend and frontend tests pass

Starosdev and others added 10 commits March 9, 2026 23:12
… (#340)

## Summary

- Remove `strings.ToLower` from scanned and config-override device file
paths in `TransformDetectedDevices`
- `/dev/disk/by-id/` paths containing uppercase manufacturer names and
serial numbers are now passed verbatim to `smartctl`
- Config override map lookups use `strings.EqualFold` so comparisons
remain case-insensitive without mutating stored paths
- WWN normalization to lowercase is unchanged
- Add `groupedDeviceKey` helper for case-insensitive map key lookup
- Add two tests covering uppercase by-id paths via config override and
via scan

## Linked Issues

Closes #335

## Test plan

- [x] `TestDetect_TransformDetectedDevices_UppercaseByIDPath` —
config-only uppercase path preserved verbatim
- [x] `TestDetect_TransformDetectedDevices_UppercaseByIDPathScanned` —
scanned uppercase path preserved verbatim
- [x] All existing detect package tests pass (`go test
./collector/pkg/detect/...`)
)

- Add notify_on_collector_error bool setting (default true) with DB migration m20260410000000
- Add NotifyFailureTypeCollectorError constant and NewCollectorError() constructor to notify package
- Add POST /api/device/:id/collector-error and POST /api/collector/scan-error endpoints
- Collector reports smartctl --scan failures via scan-error endpoint and --xall fatal errors via device collector-error endpoint
- Missing WWN after info phase also reported as scan-level error
- Notifications respect mute, quiet hours, and rate limiting via existing NotificationGate
- Frontend: add notify_on_collector_error to AppConfig interface, defaults, settings component, and settings dialog toggle
…ions

- fix(collector): report fatal xall exit codes (0x01, 0x02) to backend
  instead of silently continuing with unusable smartctl output
- fix(handler): extract sendNotificationViaGate helper to eliminate
  duplicate notification dispatch blocks in both error handlers
- fix(handler): accept optional device_name in scan-error payload so
  the notification subject shows the device name instead of "(unknown device)"
- fix(collector): pass device name through ReportScanError for info errors
- fix(models): document why bool-default-true fields cannot be set in ApplyDefaults
- test(handler): add handler tests covering bad JSON, notify disabled,
  device not found, and muted device paths
- CLAUDE.md (gitignored, local only): added 5 missing API endpoints
  (mqtt-sync, reset-status, missed-ping-timeout, collector-error, scan-error)
- README.md: add Collector-Side Error Notifications to Extended Features
- docs/DEPENDENCIES.md: update date, correct 6 Go package versions,
  add 6 new packages (paho.mqtt, fpdf, mapstructure/v2, jwt/v5, uuid,
  automaxprocs), mark Phase 2 updates complete
- docs/TROUBLESHOOTING_NOTIFICATIONS.md: add collector error notifications
  section documenting per-device and scan-level error reporting (#334)
- Deleted OVERRIDE_FIX_PROGRESS.md (was untracked, contained credentials)
## Summary

- Embeds `robfig/cron/v3` scheduler directly in the collector binary,
removing the dependency on an external `crond` daemon
- Adds `--cron-schedule`, `--run-startup`, and `--run-startup-sleep` CLI
flags
- Adds `cron.schedule`, `cron.run_on_startup`, and
`cron.startup_sleep_secs` YAML config keys
- Supports `COLLECTOR_CRON_SCHEDULE`, `COLLECTOR_RUN_STARTUP`, and
`COLLECTOR_RUN_STARTUP_SLEEP` env vars
- Without a schedule configured, single-run behavior is completely
unchanged
- Blocks on SIGINT/SIGTERM and drains the scheduler gracefully on
shutdown
- Documents the new config block in `example.collector.yaml`

## Linked Issues

Closes #333

## Test plan

- [x] `go build ./collector/cmd/collector-metrics/` passes
- [x] `go vet ./collector/...` passes
- [x] `go mod tidy` run — `robfig/cron/v3` correctly classified as
direct dependency
- [ ] Manual: run collector with `--cron-schedule "* * * * *"` and
verify recurring collection
- [ ] Manual: run collector without `--cron-schedule` and verify
single-run exit behavior is unchanged
- [ ] Manual: verify `COLLECTOR_CRON_SCHEDULE` env var is picked up
correctly
## Summary

- Add DB migration (`m20260411000000`) to enforce a unique constraint on
`(protocol, attribute_id, wwn)` in `attribute_overrides`, deduplicating
any existing rows first
- Update `AttributeOverride` model GORM tags to use `uniqueIndex` to
match the new constraint
- Add `GetAllOverridesForDisplay` repository method that merges UI
overrides (DB) with config-file overrides, so the settings panel shows
everything currently active — config overrides appear as read-only
(source: `config`, ID: 0)
- Add corresponding mock method for `GetAllOverridesForDisplay`
- Refactor `SaveAttributeOverride` to extract validation helpers
(`validateAttributeOverride`, `validateForceStatus`,
`validateThresholds`) to keep cognitive complexity in bounds
- Add input validation: at least one threshold required in custom mode,
`warn_above < fail_above`, non-negative values, and WWN hex-format check
for all action types
- Fix misleading comment in `applier.go` that implied thresholds can be
combined with `force_status` — they are mutually exclusive at the call
site
- Default new override form action to custom threshold (empty string) so
threshold fields are visible on first open instead of `ignore`
- Add 18 handler unit tests for `GetAttributeOverrides` and
`SaveAttributeOverride` (mock-based, no InfluxDB required)

## Linked Issues

Closes #345

## Test plan

- [x] `go build ./...` passes cleanly
- [x] All 18 new handler tests pass (`go test -v -run
TestAttributeOverride ./webapp/backend/pkg/web/handler/...`)
- [x] All overrides package unit tests pass (`go test -v
./webapp/backend/pkg/overrides/...`)
- [x] Migration ordering verified (m20260410 before m20260411 in slice)
- [x] WWN validation tested for all three action types
- [x] Threshold edge cases verified (`warn >= fail` correctly rejected)
@Starosdev Starosdev added backend Backend/Go related frontend Frontend/Angular related bug Something isn't working enhancement New feature or request labels Mar 11, 2026
@Starosdev Starosdev merged commit e329684 into master Mar 11, 2026
20 checks passed
Starosdev pushed a commit that referenced this pull request Mar 11, 2026
## [1.46.0](v1.45.0...v1.46.0) (2026-03-11)

### Features

* **scrutiny:** collector cron scheduling, smartctl error notifications, and override improvements ([#347](#347)) ([e329684](e329684)), closes [#345](#345)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend Backend/Go related bug Something isn't working enhancement New feature or request frontend Frontend/Angular related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(overrides): address gaps in attribute override system

1 participant