Skip to content

DRAFT: Feat agentruntime controller#218

Draft
varshaprasad96 wants to merge 4 commits intokagenti:mainfrom
varshaprasad96:feat-agentruntime-controller
Draft

DRAFT: Feat agentruntime controller#218
varshaprasad96 wants to merge 4 commits intokagenti:mainfrom
varshaprasad96:feat-agentruntime-controller

Conversation

@varshaprasad96
Copy link
Contributor

@varshaprasad96 varshaprasad96 commented Mar 11, 2026

Summary

Implements Phase 1 of the AgentRuntime epic (kagenti/kagenti#862) — the CRD definition, controller with label management, config hash computation, and tests.

AgentRuntime is the declarative way to enroll a workload into the Kagenti platform. Instead of manually adding kagenti.io/type labels to Deployment manifests, developers create an AgentRuntime CR with a targetRef pointing to their workload. The controller applies labels and config-hash annotations to the PodTemplateSpec, triggering rolling updates when configuration changes.

What's included

  • AgentRuntime controller (internal/controller/agentruntime_controller.go)
    • Resolves targetRef to Deployment/StatefulSet
    • Applies kagenti.io/type label to workload metadata and PodTemplateSpec
    • Applies kagenti.io/config-hash annotation to PodTemplateSpec (triggers rolling updates)
    • Applies app.kubernetes.io/managed-by: kagenti-operator to workload metadata
    • Counts configured pods and updates status with phase and conditions
    • Finalizer (kagenti.io/cleanup): preserves type label on deletion, updates config-hash to defaults-only
  • Config hash computation (internal/controller/config_hash.go)
    • Merges AgentRuntime spec with platform defaults from kagenti-webhook-defaults ConfigMap
    • Deterministic SHA256 hash via sorted-key JSON serialization
    • Defaults-only hash for CR deletion (triggers rollback to platform defaults)
  • Documentation — Updated docs/api-reference.md and docs/architecture.md
  • Tests — 10 unit tests (config hash) + 7 integration tests (controller lifecycle via envtest)

How it works

  1. Developer deploys a standard Deployment (no kagenti labels)
  2. Developer creates an AgentRuntime CR with targetRef → Deployment
  3. Controller applies labels + config-hash → PodTemplateSpec change → rolling update
  4. On update: config-hash changes → rolling update with new config
  5. On delete: type label preserved, config-hash set to defaults-only → rolling update

What's NOT included (follow-up PRs)

  • Webhook coordination (Phase 2 — depends on kagenti-extensions)
  • ConfigMap watch for defaults changes
  • Field indexer for targetRef lookups
  • Helm chart updates (Phase 3)
  • E2E tests (Phase 4)

Related issue(s)

(Optional) Testing Instructions

Fixes #

Add deterministic SHA256 config hash computation that merges
AgentRuntime spec fields with platform defaults from the
kagenti-webhook-defaults ConfigMap. The hash is used as a
PodTemplateSpec annotation to trigger rolling updates when
configuration changes.

- ComputeConfigHash: merges spec + defaults into canonical JSON
- ComputeDefaultsOnlyHash: defaults-only hash for CR deletion
- Deterministic output via sorted-key JSON serialization
- Graceful fallback when defaults ConfigMap is missing

Includes unit tests covering determinism, change detection across
type/trace/identity fields, defaults-only vs spec+defaults
differentiation, and missing ConfigMap handling.

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Implement the AgentRuntime reconciler that watches AgentRuntime CRs
and applies configuration to target workloads via targetRef.

Controller responsibilities:
- Resolve targetRef to Deployment/StatefulSet (duck typing)
- Apply kagenti.io/type label to workload metadata and PodTemplateSpec
- Apply kagenti.io/config-hash annotation to PodTemplateSpec to
  trigger rolling updates on config changes
- Apply app.kubernetes.io/managed-by label to workload metadata
- Count configured pods and update status (phase, conditions)
- Finalizer (kagenti.io/cleanup) preserves type label on deletion
  and updates config-hash to defaults-only

Also:
- Wire controller into cmd/main.go
- Regenerate RBAC manifests for agentruntimes resources

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Add Ginkgo/envtest integration tests covering the AgentRuntime
controller lifecycle:

- Finalizer added on first reconcile
- Labels and config-hash applied to target Deployment
- Status set to Active after successful configuration
- Config-hash updates when spec changes (trace config)
- Error status with TargetNotFound when target is missing
- Deletion preserves type label, updates config-hash to defaults,
  removes managed-by label
- Tool runtime type correctly applied

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
MUST FIX:
- Use time.Second instead of raw nanosecond arithmetic for RequeueAfter
- Fix no-op check in applyWorkloadConfig to compare original labels
  before mutation, not after
- Fix handleDeletion to properly check NotFound before updating
  workload (was using client.IgnoreNotFound which allowed updates
  on zero-value objects)
- Remove unused unstructured parameter from applyWorkloadConfig
  (resolveTargetRef now only checks existence)

SHOULD FIX:
- Rename LabelConfigHash to AnnotationConfigHash (it's an annotation)
- Use consistent RequeueAfter strategy for all transient errors
- Remove unused ctx parameter from setPhase
- Simplify config_hash.go: remove sortedDefaults copy and marshalSorted
  double-serialization (encoding/json already sorts map keys)
- Document isPodOwnedByWorkload prefix-matching limitation
- Use kebab-case controller name ("agentruntime" not "AgentRuntime")

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
@varshaprasad96 varshaprasad96 force-pushed the feat-agentruntime-controller branch from 8a4626a to f526405 Compare March 12, 2026 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant