Skip to content

feat: enhance notification system and migrate to pubsub/v2 to fix CI#642

Open
nothing0012 wants to merge 18 commits intomainfrom
zz/add-notif-1
Open

feat: enhance notification system and migrate to pubsub/v2 to fix CI#642
nothing0012 wants to merge 18 commits intomainfrom
zz/add-notif-1

Conversation

@nothing0012
Copy link
Contributor

@nothing0012 nothing0012 commented Dec 24, 2025

Description

This PR enhances the Flagr notification system with richer context, adds generic webhook support, implements retry logic with exponential backoff, adds comprehensive validation, and stabilizes the CI pipeline through a modern dependency upgrade.

Major Enhancements:

Notification System Improvements

  • Retry Logic with Exponential Backoff: Added configurable retry mechanism for HTTP-based providers (email, webhook) with exponential backoff and jitter to handle transient failures gracefully.
  • Configuration Options:
    • FLAGR_NOTIFICATION_MAX_RETRIES (default: 3) - Maximum retry attempts
    • FLAGR_NOTIFICATION_RETRY_BASE (default: 1s) - Base delay for backoff
    • FLAGR_NOTIFICATION_RETRY_MAX (default: 10s) - Maximum delay between retries
  • Concurrency Control: Added semaphore limiting (default: 100 concurrent sends) to prevent resource exhaustion under load.
  • Observability: Added detailed metrics (notification.sent) tagged with provider, operation, entity_type, and status for monitoring.
  • Startup Validation: Added ValidateConfig() call to validate notification configuration at startup and log warnings if misconfigured.
  • Multi-Notifier Support: Refactored to send notifications to all configured notifiers concurrently with aggregated error handling.

Bug Fixes & Reliability

  • Test Pollution Fix: Fixed race condition in sendNotification where notifiers were captured inside the goroutine. Now captures notifiers before spawning the goroutine to prevent test pollution between test runs.
  • Soft-Delete Handling: Fixed SaveFlagSnapshot to properly handle soft-deleted flags using Unscoped() queries, ensuring notifications work correctly for delete/restore operations.
  • Complete Notification Coverage: Added missing notifications for:
    • DeleteFlag (operation: delete)
    • RestoreFlag (operation: restore)
    • SetFlagEnabledState (operation: update)
  • Fixed Operation Constants: Changed hardcoded string "update" to proper notification.OperationUpdate constant in variant deletion.

Code Quality & Testing

  • Refactoring: Renamed hook.godispatch.go to better reflect its responsibility.
  • Interface Enhancement: Added Name() method to Notifier interface for better observability and metrics.
  • Comprehensive Tests: Added test coverage for all missing notification scenarios (delete, restore, enable/disable) and concurrent notification dispatch.
  • Improved Error Handling: Enhanced HTTP error handling with proper response body reading and cleanup.

Documentation Updates

  • Updated docs/flagr_notifications.md with new features, retry configuration, concurrency limits, and important notes about asynchronous delivery.
  • Clarified that notifications are for flag-related operations only (EntityType always "flag").
  • Added section on retry configuration and observability.

CI & Dependency Migration

  • Migrated cloud.google.com/go/pubsub from v1 to v2 to resolve deprecation warnings.
  • Updated Topic usage to Publisher in both producer logic and tests.
  • Resolved SA1019 lint warnings which were blocking the CI pipeline.
  • Verified that all CI checks (unit tests, integration tests, linting) are now passing.

Motivation and Context

The notification system needed better reliability, observability, and configurability for production use. The retry logic ensures transient failures don't result in lost notifications. The validation and metrics provide operational visibility. The bug fixes ensure all flag operations trigger appropriate notifications, even for soft-deleted entities.

How Has This Been Tested?

  • CI Suite: make ci passed with 0 lint issues and 0 test failures.
  • Unit Testing: Added comprehensive tests for DeleteFlag, RestoreFlag, and SetFlagEnabledState notifications. 100% coverage on new code paths. Fixed test pollution issue in concurrent notification tests.
  • Integration Testing: Verified handler-level notification triggers and async goroutine behavior.
  • Manual Verification: Confirmed retry logic works with transient failures, semaphore limits concurrency, and metrics are recorded correctly.

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Refactoring / Technical debt
  • Bug fix (non-breaking change which fixes an issue)
  • Documentation

Checklist:

  • My code follows the code style of this project.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All lint checks passed locally (make ci).

@nothing0012 nothing0012 changed the title feat: enhance notification system with rich details and opt-in diffs feat: enhance notification system and migrate to pubsub/v2 to fix CI Dec 24, 2025
nothing0012 and others added 15 commits March 3, 2026 13:27
- Included flag description and pre/post change values in notifications
- Implemented privacy-first, opt-in JSON diffing (FLAGR_NOTIFICATION_DETAILED_DIFF_ENABLED)
- Added automatic pretty-printing for JSON diffs to improve readability
- Refactored notification package for better testability (removed public init, added SetNotifier)
- Added handler-level integration tests for notifications
- Upgraded cloud.google.com/go/pubsub to v2 to resolve deprecation warnings
- Fixed various test stability issues (DB transaction ordering, nil notifier panics)
Updated slack.go and email.go to use the defined Operation constants
instead of string literals for better consistency and type safety.
Refactored entity.SaveFlagSnapshot to accept notification.Operation
instead of a string, improving type safety and consistency across
handler calls.
…est coverage

- Renamed hook.go to dispatch.go to better reflect its purpose
- Add exponential backoff retry logic for HTTP providers (email, webhook)
- Add configurable retry parameters (max retries, base delay, max delay)
- Add startup validation for notification configuration
- Add concurrency limiting with semaphore (default 100)
- Add detailed metrics for notification success/failure
- Update SaveFlagSnapshot to handle soft-deleted flags correctly
- Add missing notifications for DeleteFlag, RestoreFlag, and SetFlagEnabledState
- Fix OperationUpdate constant usage in variant deletion
- Add comprehensive tests for new notification scenarios
- Update documentation with new features and configuration options

This provides a more robust and observable notification system with better error handling and test coverage.
…private API)

Simplify public API by making SendNotification private and keeping only SendFlagNotification as the public interface. This clarifies that the notification system currently only supports flag entities and prevents misuse.

This is a non-breaking refactoring - all internal callers already use SendFlagNotification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants