fix(batch): more high cardinality metric attribute fixes by ericallam · Pull Request #2846 · triggerdotdev/trigger.dev

ericallam · 2026-01-08T09:58:46Z

No description provided.

changeset-bot · 2026-01-08T09:58:49Z

⚠️ No Changeset found

Latest commit: 758e520

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-01-08T09:59:01Z

Walkthrough

This pull request introduces OpenTelemetry metrics cardinality guidelines and refactors batch-queue metrics. A new guideline document is added at .cursor/rules/otel-metrics.mdc that defines cardinality best practices, provides do's and don'ts, and includes TypeScript examples. Concurrently, metrics in internal-packages/run-engine/src/batch-queue/index.ts are updated to replace environment ID-based attribute keys with environment_type keys derived from meta.environmentType or options.environmentType, applied consistently across enqueued, processed, failed, and duration-related metrics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	No pull request description was provided by the author. The required template sections (Checklist, Testing, Changelog, Screenshots) are entirely missing.	Add a comprehensive pull request description following the repository template, including testing steps, changelog entry, and completion of the checklist items.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: fixing high cardinality metric attributes in the batch module, which aligns with both file changes (metrics cardinality guidance and metric key updates).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-08T10:00:07Z

PR Review: fix(batch): more high cardinality metric attribute fixes

Summary

This PR correctly addresses high-cardinality metric attribute issues in the BatchQueue by replacing unbounded attributes (envId, itemCount) with low-cardinality alternatives (environment_type, errorCode).

✅ Code Quality & Best Practices

Positives:

Consistent attribute naming: All metrics now use environment_type instead of envId, maintaining consistency across the codebase
Good documentation: The new .cursor/rules/otel-metrics.mdc file provides clear guidelines with examples for future contributors
Appropriate reference: Points to schedule-engine as a good example for low-cardinality metrics

Suggestions:

The rules file references internal-packages/schedule-engine/src/engine/index.ts as a good example - ensure this reference stays up-to-date if that file is refactored

🐛 Potential Issues

Moved metric recording location (line 735-736): The itemQueueTimeHistogram?.record() call was moved after the getMeta() call. This is correct since it now requires meta.environmentType, but note that if getMeta() fails (meta is null), the queue time metric won't be recorded at all for that item. This is likely acceptable since the item would be invalid anyway.
Attribute consistency check: Verify that errorCode values are bounded. Looking at the code:
- result.errorCode comes from ProcessBatchItemCallback return type
- "UNEXPECTED_ERROR" is hardcoded
If error codes are a controlled set, this is fine. If they can be arbitrary strings from external sources, this could still be a cardinality issue.

📊 Performance Considerations

Positive impacts:

Reduced cardinality will significantly decrease memory usage in metrics backends (Axiom, Prometheus, etc.)
Faster metric queries due to fewer unique time series
Lower storage costs

No negative performance impacts - the changes only modify attribute values, not the metric recording logic itself.

🔒 Security Concerns

No security concerns with this PR. The changes:

Don't expose any new data
Remove potentially sensitive IDs (envId) from metrics (actually improves privacy)

🧪 Test Coverage

The PR doesn't include tests, but this is reasonable because:

Metric attributes are typically tested through integration/observability tests rather than unit tests
The changes are straightforward attribute value swaps
The underlying BatchMeta.environmentType field already exists and is typed as RuntimeEnvironmentType enum (which is bounded)

📝 Minor Observations

Comment update (line 703): Changed "Record queue time metric" to "Calculate queue time" - this is accurate since the recording happens later now
errorCode retention: The errorCode attribute is kept on itemsFailedCounter which is appropriate - it's a bounded set of error types (from the callback interface)

✅ Recommendation

LGTM - This PR correctly fixes the high-cardinality issue. The environment_type enum (PRODUCTION, STAGING, DEVELOPMENT, PREVIEW) has exactly 4 values, making it ideal for metric attributes.

The documentation in .cursor/rules/otel-metrics.mdc is a nice addition that will help prevent similar issues in the future.

vibe-kanban-cloud · 2026-01-08T10:02:23Z

Review Complete

Your review story is ready!

View Story

Comment !reviewfast on this PR to re-generate the story.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.cursor/rules/otel-metrics.mdc:
- Around line 1-53: The referenced schedule engine example violates the
guideline by attaching a high-cardinality UUID attribute; change the metric call
that uses devEnvironmentCheckCounter.add(1, { environment_id:
instance.environment.id }) to pass a low-cardinality attribute instead (e.g.,
environment_type: instance.environment.type or another bounded enum) so the
metric uses environment_type rather than environment_id, and update any related
tests or telemetry consumers expecting the old attribute name; alternatively, if
you prefer to keep the schedule engine unchanged, update the guideline reference
text to point to a corrected example that uses environment_type.

In @internal-packages/run-engine/src/batch-queue/index.ts:
- Around line 289-292: The itemsFailedCounter metric currently includes a
free-form string attribute errorCode (set from the ProcessBatchItemCallback
result at usages around itemsFailedCounter at lines ~813 and ~853), which can
cause high cardinality; either stop including errorCode in the metric attributes
(remove it from itemsFailedCounter.add calls and rely on structured logs for the
raw error) or normalize it to a bounded enum/union of known error categories
(map arbitrary callback error strings to a small set of stable codes like
"validation", "transient", "permanent", "unknown" before adding to
itemsFailedCounter). Update the code paths that increment itemsFailedCounter
(references: itemsFailedCounter, ProcessBatchItemCallback result handling) so
they no longer pass unbounded errorCode directly — instead pass one of the
limited enum values or omit the attribute entirely and emit the full error to
logs.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dfbe1d and 758e520.

📒 Files selected for processing (2)

.cursor/rules/otel-metrics.mdc
internal-packages/run-engine/src/batch-queue/index.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

internal-packages/run-engine/src/batch-queue/index.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

internal-packages/run-engine/src/batch-queue/index.ts

**/*.{js,ts,jsx,tsx,json,md,css,scss}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier

Files:

internal-packages/run-engine/src/batch-queue/index.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: Cursor Bugbot
GitHub Check: Analyze (javascript-typescript)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
GitHub Check: typecheck / typecheck

🔇 Additional comments (1)

internal-packages/run-engine/src/batch-queue/index.ts (1)

703-736: Comment improvements enhance code clarity.

The updated comments accurately reflect:

Queue time calculation logic (line 703-704)

The requirement to fetch meta for the environment_type attribute (line 735-736)

This makes the code's intent clearer, especially the dependency on meta.environmentType for the metric.

.cursor/rules/otel-metrics.mdc

internal-packages/run-engine/src/batch-queue/index.ts

fix(batch): more high cardinality metric attribute fixes

758e520

0ski approved these changes Jan 8, 2026

View reviewed changes

coderabbitai bot reviewed Jan 8, 2026

View reviewed changes

.cursor/rules/otel-metrics.mdc Show resolved Hide resolved

internal-packages/run-engine/src/batch-queue/index.ts Show resolved Hide resolved

mpcgrid approved these changes Jan 8, 2026

View reviewed changes

ericallam merged commit 7c2e78c into main Jan 8, 2026
38 checks passed

ericallam deleted the ea-branch-113-3 branch January 8, 2026 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

fix(batch): more high cardinality metric attribute fixes#2846

fix(batch): more high cardinality metric attribute fixes#2846
ericallam merged 1 commit intomainfrom
ea-branch-113-3

ericallam commented Jan 8, 2026

Uh oh!

changeset-bot bot commented Jan 8, 2026

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 8, 2026

Uh oh!

vibe-kanban-cloud bot commented Jan 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

ericallam commented Jan 8, 2026

Uh oh!

changeset-bot bot commented Jan 8, 2026

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

claude bot commented Jan 8, 2026

PR Review: fix(batch): more high cardinality metric attribute fixes

Summary

✅ Code Quality & Best Practices

🐛 Potential Issues

📊 Performance Considerations

🔒 Security Concerns

🧪 Test Coverage

📝 Minor Observations

✅ Recommendation

Uh oh!

vibe-kanban-cloud bot commented Jan 8, 2026

Review Complete

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 8, 2026 •

edited

Loading