Skip to content

Conversation

@THardy98
Copy link
Contributor

@THardy98 THardy98 commented Jan 8, 2026

What was changed

Add a separate HTTP server for process metrics and worker metadata, started when spawning SDK workers. This works for all language workers (Go/Python/Java/TS/.NET).

The sidecar serves:

  • /metrics: Process metrics (CPU/memory) for the worker PID
  • /info: Worker metadata (sdk_version, build_id)

The /info endpoint only returns fields that run-scenario doesn't already know. During metrics export, /info is fetched to populate additional fields in the parquet output.

Why?

We opt for a Prom server sidecar for process metrics because core-based SDKs have their existing Prom server started by core, making it difficult to:

  • expand its registry to include process metrics
  • add a handler to surface additional metadata (i.e. worker metadata from /info`)
    Using a sidecar server to capture process metrics is simple.

We add an /info endpoint to this sidecar server so that the scenario runner can fetch additional worker metadata for metrics export. This is useful in scenarios where the scenario runner and the worker are running on different machines (as is the case in cloud worker testing).

  1. How was this tested:
    Added some basic integration tests

Add a separate HTTP server for process metrics and worker metadata,
started by run.go after spawning SDK workers. This works for all
language workers (Go/Python/Java/TS/.NET).

The sidecar serves:
- /metrics: Process metrics (CPU/memory) for the worker PID
- /info: Worker metadata (sdk_version, build_id)

The /info endpoint only returns fields that run-scenario doesn't
already know. During metrics export, /info is fetched to populate
additional fields in the parquet output.
@THardy98 THardy98 requested a review from a team as a code owner January 8, 2026 15:14
@THardy98 THardy98 requested a review from Sushisource January 8, 2026 15:14
@THardy98 THardy98 force-pushed the add-process-metrics-sidecar branch 2 times, most recently from 28bf7ba to bd291f5 Compare January 8, 2026 22:09
- Add language field to /info endpoint and MetricLine parquet export
- Add --prom-export-process-job flag (default: omes-worker-process) to
  query process metrics from a separate Prometheus job than SDK metrics
- Pass language from runner to StartProcessMetricsSidecar
@THardy98 THardy98 force-pushed the add-process-metrics-sidecar branch from bd291f5 to 0ec4989 Compare January 9, 2026 15:42
- Add --auth-header argument to .NET, TypeScript, Python, and Java workers
- Fix TLS flag parsing in TypeScript and Python to accept string values
- Add --build-id argument to non-Go workers
@THardy98 THardy98 force-pushed the add-process-metrics-sidecar branch from 0ec4989 to 94c9c9a Compare January 9, 2026 16:55
This flag allows specifying the SDK version/ref to report in metrics
separately from the --version flag (used for build-time SDK selection).

The flag is:
- Used by the sidecar's /info endpoint as sdk_version
- NOT passed through to the actual worker process
- Falls back to --version value if not specified

This enables reporting the actual git ref (e.g., "master") in metrics
when building from a local SDK directory (--version ./repo).
@THardy98 THardy98 force-pushed the add-process-metrics-sidecar branch from 689aff8 to 237999d Compare January 9, 2026 20:46
The sdk_version from the sidecar's /info endpoint is now included
in the exported parquet metrics file, making it available for
analysis and dashboards.
@THardy98 THardy98 force-pushed the add-process-metrics-sidecar branch from 1193946 to f6e7c49 Compare January 9, 2026 21:56
Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly makes sense to me - an update to the README describing this architecture would be a good idea

})
})

// Start server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boring comment (so are the other ones inside the func, really)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment on lines +84 to +87
// MetricsVersionTag is the SDK version/ref to report in metrics.
// This is used by the sidecar's /info endpoint and is NOT passed to the worker.
// If empty, falls back to the --version flag value.
MetricsVersionTag string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want this to be different from version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--version can be a path to a local repo (which is what we use in our cloud worker testing), we'd prefer a git commit hash or something that actually identifies the version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see.

Comment on lines 57 to 58
// Give time for shutdown
time.Sleep(100 * time.Millisecond)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should instead just use an eventually kind of thing on the subsequent assertion (also more boring comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with a util called eventually (assertion with a timeout)

@THardy98
Copy link
Contributor Author

updated README to include some info on the metrics sidecar

@THardy98 THardy98 requested a review from Sushisource January 13, 2026 22:21
@THardy98 THardy98 merged commit e33cdd0 into main Jan 14, 2026
28 checks passed
@THardy98 THardy98 deleted the add-process-metrics-sidecar branch January 14, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants