Skip to content

Add cost monitoring alerts (cost-hourly-spike, cost-daily-budget) with configurable thresholds and dashboard/docs integration#3

Open
yash194 wants to merge 1 commit intosteadwing:mainfrom
yash194:llmcostrules
Open

Add cost monitoring alerts (cost-hourly-spike, cost-daily-budget) with configurable thresholds and dashboard/docs integration#3
yash194 wants to merge 1 commit intosteadwing:mainfrom
yash194:llmcostrules

Conversation

@yash194
Copy link

@yash194 yash194 commented Feb 16, 2026

Solution Summary

This PR introduces first-class cost alerting with two new rules:

  1. cost-hourly-spike
  2. cost-daily-budget

Both rules are based on llm.token_usage events with numeric costUsd, support existing rule overrides (enabled, threshold, cooldownMinutes), and automatically inherit evaluator protections (cooldown
deduplication + hourly alert cap).

Implementation Order and Changes

1. Core rule engine changes (src/core/rules.ts)

  1. Added sumInWindow(ctx, name, windowMs):
  • Reads a named window from ctx.state.windows.
  • Computes cutoff = ctx.now - windowMs.
  • Sums entry.value for entries with entry.ts >= cutoff.
  • Returns numeric total in-window value.
  1. Added pushSummedBucket(ctx, name, value, bucketMs, maxEntries):
  • Validates value (finite and > 0 only).
  • Computes bucketTs = floor(now / bucketMs) * bucketMs.
  • Loads/creates named window.
  • If latest entry is same bucket, accumulates into last.value.
  • Else appends { ts: bucketTs, value }.
  • Enforces bounded memory with maxEntries by dropping oldest items.
  1. Added rule cost-hourly-spike:
  • Trigger input: event.type === "llm.token_usage" and numeric event.costUsd.
  • Storage strategy: minute buckets (bucketMs = 60_000) in window "cost-hourly-spike-usd".
  • Window sum: last 60 minutes.
  • Fire condition: hourlyUsd > threshold.
  • Default threshold: 5.0 USD.
  • Default cooldown: 30 minutes.
  • Severity: warn.
  1. Added rule cost-daily-budget:
  • Trigger input: event.type === "llm.token_usage" and numeric event.costUsd.
  • Storage strategy: minute buckets in window "cost-daily-budget-usd".
  • Window sum: last 24 hours.
  • Fire condition: dailyUsd > threshold.
  • Default threshold: 20.0 USD.
  • Default cooldown: 6 hours.
  • Severity: error.
  1. Registered both rules in ALL_RULES, increasing active rules from 8 to 10.

2. Dashboard status wiring (src/plugin/dashboard-routes.ts)

Added both new rule IDs to RULE_IDS so they appear in dashboard rule health/status:

  • cost-hourly-spike
  • cost-daily-budget

3. Startup logging cleanup (src/core/engine.ts, src/index.ts)

Replaced hardcoded "8 rules active" log text with dynamic ${ALL_RULES.length} rules active to avoid future drift.

4. Plugin config schema + UI hints (openclaw.plugin.json)

Added explicit schema entries under rules.properties for:

  • cost-hourly-spike
  • cost-daily-budget

Each supports:

  • enabled
  • threshold
  • cooldownMinutes

Added UI hints:

  • rules.cost-hourly-spike.threshold (Hourly Cost Threshold)
  • rules.cost-daily-budget.threshold (Daily Budget Threshold)

5. Documentation updates (README.md, GUIDE.md)

  1. Updated rule count from 8 to 10.
  2. Added both cost rules to the Alert Rules table with defaults.
  3. Updated config example to include cost rule threshold overrides.
  4. Updated GUIDE text describing evaluator/rules and capabilities.

Why sumInWindow + pushSummedBucket were introduced

sumInWindow

Needed because existing helper only counted entries.
Cost alerting requires summing numeric costUsd values across time ranges (60m/24h), not counting event frequency.

pushSummedBucket

Needed to avoid undercounting and memory growth issues:

  • If every cost event were a raw entry, older entries could be dropped by max-window limits before daily sums complete.
  • Bucketing consolidates many events per minute into one aggregate entry.
  • This keeps rolling sums accurate enough for alerting while maintaining bounded memory.

Rule Logic (General)

For each incoming llm.token_usage event with numeric costUsd:

  1. Add cost into minute bucket window.
  2. Compute rolling sum for the relevant time horizon.
  3. Compare against effective threshold (override-aware).
  4. If exceeded, emit alert.
  5. Evaluator applies existing global protections:
  • fingerprint cooldown dedup
  • hourly alert cap

Compatibility and Risk

  • Backward compatible with existing configs.
  • No change to existing rule behavior.
  • New logic only activates on llm.token_usage with valid costUsd.
  • No evaluator core behavior changes required.

Validation Performed

  • openclaw.plugin.json parsed successfully.
  • npm run typecheck passed.

Fixes #2

@NILAY1556 NILAY1556 self-requested a review February 16, 2026 18:39
@NILAY1556
Copy link
Collaborator

thanks for contributing @yash194
we are on revamp to make observability better,
will inform you here after our revamp is done , than we can discuss on this

i hope it's fine , BTW great observation to find gap

@yash194
Copy link
Author

yash194 commented Feb 16, 2026

@NILAY1556 Thats great ! thanks . Just one thing i had many more things in mind , should i wait as of now to open that issues and prs and open them after the revamp or i can open now.

@NILAY1556
Copy link
Collaborator

@NILAY1556 Thats great ! thanks . Just one thing i had many more things in mind , should i wait as of now to open that issues and prs and open them after the revamp or i can open now.

you can create the issue we can discuss overthere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add configurable LLM cost alerts (hourly spike + daily budget)

2 participants