Pod Log Retrieval API & Validation Failure Log Preservation#350
Pod Log Retrieval API & Validation Failure Log Preservation#350paras-agarwal-meesho wants to merge 4 commits intomainfrom
Conversation
WalkthroughAdds non‑streaming ArgoCD application log retrieval with an API endpoint and handler; integrates failure log capture into Predator validation, formatting degraded pod logs and uploading them to GCS; and introduces comprehensive tests for both features. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Controller as InfrastructureController
participant Handler as InfrastructureHandler
participant ArgoCD as ArgoCD API
Client->>Controller: GET /applications/:appName/logs
activate Controller
Controller->>Controller: validate appName, workingEnv, query params
Controller->>Handler: GetApplicationLogs(appName, workingEnv, opts)
activate Handler
Handler->>Handler: derive ArgoCD app name, build request
Handler->>ArgoCD: HTTP GET logs (NDJSON)
activate ArgoCD
ArgoCD-->>Handler: NDJSON log entries
deactivate ArgoCD
Handler-->>Controller: []ApplicationLogEntry
deactivate Handler
Controller-->>Client: 200 OK with entries
deactivate Controller
sequenceDiagram
participant Predator
participant InfraHandler as InfrastructureHandler
participant ArgoCD as ArgoCD API
participant GCS as GCS Storage
Predator->>InfraHandler: findDegradedPods(serviceName)
activate InfraHandler
InfraHandler->>InfraHandler: fetch resource details
InfraHandler-->>Predator: list of degraded pods
deactivate InfraHandler
loop for each degraded pod
Predator->>InfraHandler: fetchDegradedPodLogs(serviceName, podName) (Previous=true)
activate InfraHandler
InfraHandler->>ArgoCD: GetApplicationLogs(...)
activate ArgoCD
ArgoCD-->>InfraHandler: log entries
deactivate ArgoCD
InfraHandler->>InfraHandler: convert timestamps to IST, format pod logs
InfraHandler-->>Predator: formatted pod logs
deactivate InfraHandler
end
Predator->>Predator: formatValidationLogsPlainText
Predator->>Predator: buildValidationLogsGCSPath
Predator->>GCS: UploadFile(report, gcsPath)
activate GCS
GCS-->>Predator: public GCS URL
deactivate GCS
Predator->>Predator: updateRequestValidationStatus(groupID, success=false, gcsLogPath)
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@horizon/internal/infrastructure/controller/controller.go`:
- Around line 177-187: The handler currently allows tailLines == 0 but
downstream URL builder coerces non-positive values to 1000; make the contract
consistent by rejecting 0 at the API layer: in the block that parses q.TailLines
(the tailLines variable and strconv.ParseInt usage around q.TailLines), change
the validation to require tailLines > 0 and return a 400 with a message like
"tailLines must be a positive integer" when it's 0 or negative; this preserves
caller intent and aligns with the URL builder behavior.
In `@horizon/internal/predator/handler/predator_validation_logs.go`:
- Around line 183-190: Sanitize and normalize job.ServiceName before composing
the GCS object key in buildValidationLogsGCSPath: create a sanitized variable
(e.g., sanitizedService := strings.ToLower(job.ServiceName)) and then strip or
replace unsafe characters (remove any occurrences of "..", replace path
separators like "/" and "\" with "-" or remove them, and restrict to
alphanumerics, dashes and underscores—e.g., via a regexp whitelist) so the
objectPath uses sanitizedService instead of job.ServiceName when calling
path.Join; update references in buildValidationLogsGCSPath accordingly to
prevent traversal and ensure consistent lowercase keys.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
.gitignorehorizon/internal/infrastructure/controller/controller.gohorizon/internal/infrastructure/controller/controller_test.gohorizon/internal/infrastructure/handler/handler.gohorizon/internal/infrastructure/router/router.gohorizon/internal/predator/handler/predator_validation.gohorizon/internal/predator/handler/predator_validation_logs.gohorizon/internal/predator/handler/predator_validation_logs_test.gohorizon/pkg/argocd/logs.gohorizon/pkg/argocd/logs_test.go
| tailLines := int64(1000) | ||
| if q.TailLines != "" { | ||
| var err error | ||
| tailLines, err = strconv.ParseInt(q.TailLines, 10, 64) | ||
| if err != nil || tailLines < 0 { | ||
| ctx.JSON(http.StatusBadRequest, gin.H{"error": gin.H{ | ||
| "message": "tailLines must be a non-negative integer", | ||
| }}) | ||
| return | ||
| } | ||
| } |
There was a problem hiding this comment.
tailLines=0 is accepted here but not honored downstream.
Line 181 accepts 0, but the URL builder later coerces non-positive values to 1000, so caller intent is lost. Please make this contract consistent (either reject 0 here or preserve 0 end-to-end).
One consistent option (reject 0 at API layer)
if q.TailLines != "" {
var err error
tailLines, err = strconv.ParseInt(q.TailLines, 10, 64)
- if err != nil || tailLines < 0 {
+ if err != nil || tailLines <= 0 {
ctx.JSON(http.StatusBadRequest, gin.H{"error": gin.H{
- "message": "tailLines must be a non-negative integer",
+ "message": "tailLines must be a positive integer",
}})
return
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tailLines := int64(1000) | |
| if q.TailLines != "" { | |
| var err error | |
| tailLines, err = strconv.ParseInt(q.TailLines, 10, 64) | |
| if err != nil || tailLines < 0 { | |
| ctx.JSON(http.StatusBadRequest, gin.H{"error": gin.H{ | |
| "message": "tailLines must be a non-negative integer", | |
| }}) | |
| return | |
| } | |
| } | |
| tailLines := int64(1000) | |
| if q.TailLines != "" { | |
| var err error | |
| tailLines, err = strconv.ParseInt(q.TailLines, 10, 64) | |
| if err != nil || tailLines <= 0 { | |
| ctx.JSON(http.StatusBadRequest, gin.H{"error": gin.H{ | |
| "message": "tailLines must be a positive integer", | |
| }}) | |
| return | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@horizon/internal/infrastructure/controller/controller.go` around lines 177 -
187, The handler currently allows tailLines == 0 but downstream URL builder
coerces non-positive values to 1000; make the contract consistent by rejecting 0
at the API layer: in the block that parses q.TailLines (the tailLines variable
and strconv.ParseInt usage around q.TailLines), change the validation to require
tailLines > 0 and return a 400 with a message like "tailLines must be a positive
integer" when it's 0 or negative; this preserves caller intent and aligns with
the URL builder behavior.
| func (p *Predator) buildValidationLogsGCSPath(job *validationjob.Table) (bucket, objectPath string) { | ||
| if p.isNonProductionEnvironment() { | ||
| bucket = pred.GcsModelBucket | ||
| objectPath = path.Join(validationLogsDir, job.ServiceName, job.GroupID+".log") | ||
| } else { | ||
| bucket = pred.GcsConfigBucket | ||
| objectPath = path.Join(validationLogsDir, job.ServiceName, job.GroupID+".log") | ||
| } |
There was a problem hiding this comment.
Sanitize and normalize job.ServiceName before composing the GCS object key.
Line 186 and Line 189 build object paths from raw job.ServiceName. If it contains /, .., or mixed case, keys can become inconsistent or escape the intended prefix layout.
Proposed fix
func (p *Predator) buildValidationLogsGCSPath(job *validationjob.Table) (bucket, objectPath string) {
+ serviceName := strings.ToLower(strings.TrimSpace(job.ServiceName))
+ if serviceName == "" || strings.Contains(serviceName, "/") || strings.Contains(serviceName, "..") {
+ return "", ""
+ }
+ objectPath = path.Join(validationLogsDir, serviceName, job.GroupID+".log")
+
if p.isNonProductionEnvironment() {
bucket = pred.GcsModelBucket
- objectPath = path.Join(validationLogsDir, job.ServiceName, job.GroupID+".log")
} else {
bucket = pred.GcsConfigBucket
- objectPath = path.Join(validationLogsDir, job.ServiceName, job.GroupID+".log")
}
return bucket, objectPath
}Based on learnings: service names (inferflow, predator, numerix) are expected to be consistently lowercase across Horizon code paths.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (p *Predator) buildValidationLogsGCSPath(job *validationjob.Table) (bucket, objectPath string) { | |
| if p.isNonProductionEnvironment() { | |
| bucket = pred.GcsModelBucket | |
| objectPath = path.Join(validationLogsDir, job.ServiceName, job.GroupID+".log") | |
| } else { | |
| bucket = pred.GcsConfigBucket | |
| objectPath = path.Join(validationLogsDir, job.ServiceName, job.GroupID+".log") | |
| } | |
| func (p *Predator) buildValidationLogsGCSPath(job *validationjob.Table) (bucket, objectPath string) { | |
| serviceName := strings.ToLower(strings.TrimSpace(job.ServiceName)) | |
| if serviceName == "" || strings.Contains(serviceName, "/") || strings.Contains(serviceName, "..") { | |
| return "", "" | |
| } | |
| objectPath = path.Join(validationLogsDir, serviceName, job.GroupID+".log") | |
| if p.isNonProductionEnvironment() { | |
| bucket = pred.GcsModelBucket | |
| } else { | |
| bucket = pred.GcsConfigBucket | |
| } | |
| return bucket, objectPath | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@horizon/internal/predator/handler/predator_validation_logs.go` around lines
183 - 190, Sanitize and normalize job.ServiceName before composing the GCS
object key in buildValidationLogsGCSPath: create a sanitized variable (e.g.,
sanitizedService := strings.ToLower(job.ServiceName)) and then strip or replace
unsafe characters (remove any occurrences of "..", replace path separators like
"/" and "\" with "-" or remove them, and restrict to alphanumerics, dashes and
underscores—e.g., via a regexp whitelist) so the objectPath uses
sanitizedService instead of job.ServiceName when calling path.Join; update
references in buildValidationLogsGCSPath accordingly to prevent traversal and
ensure consistent lowercase keys.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@horizon/internal/predator/handler/predator_validation_logs_test.go`:
- Around line 62-64: The test package already defines a type named mockGCSClient
causing a redeclaration; rename this local mock type (mockGCSClient) to a unique
identifier such as mockGCSClientForValidationLogs and update all references in
predator_validation_logs_test.go (constructors, variable literals, and method
calls) accordingly, or alternatively move the mock definition into a shared test
helper file (e.g., mocks_test.go) and remove the duplicate type here so both
tests reuse the single mock definition.
| type mockGCSClient struct { | ||
| uploadFileFn func(bucket, objectPath string, data []byte) error | ||
| } |
There was a problem hiding this comment.
Fix: mockGCSClient is already declared in another file in this package.
The pipeline failure indicates this type is redeclared. Since all _test.go files in the same package share a namespace, you need to either:
- Rename this mock to something unique like
mockGCSClientForValidationLogs - Consolidate mocks into a shared test helper file (e.g.,
mocks_test.go) and reuse across tests
Option 1: Rename the mock
-type mockGCSClient struct {
+type mockValidationLogsGCSClient struct {
uploadFileFn func(bucket, objectPath string, data []byte) error
}
-func (m *mockGCSClient) ReadFile(_, _ string) ([]byte, error) { return nil, nil }
+func (m *mockValidationLogsGCSClient) ReadFile(_, _ string) ([]byte, error) { return nil, nil }
// ... update all other method receivers similarly🧰 Tools
🪛 GitHub Actions: Horizon CI
[error] 62-62: mockGCSClient redeclared in this block
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@horizon/internal/predator/handler/predator_validation_logs_test.go` around
lines 62 - 64, The test package already defines a type named mockGCSClient
causing a redeclaration; rename this local mock type (mockGCSClient) to a unique
identifier such as mockGCSClientForValidationLogs and update all references in
predator_validation_logs_test.go (constructors, variable literals, and method
calls) accordingly, or alternatively move the mock definition into a shared test
helper file (e.g., mocks_test.go) and remove the duplicate type here so both
tests reuse the single mock definition.
🔁 Pull Request Template – BharatMLStack
Context:
Model validation failures are currently difficult to debug because the Kubernetes pod logs of the test deployable are ephemeral and lost once the pod or deployable is recycled.
This solution introduces:
The goal is to eliminate dependency on direct Argo CD or cluster access and provide durable, auditable failure diagnostics.
Describe your changes:
Testing:
Checklist before requesting a review
📂 Modules Affected
horizon(Real-time systems / networking)online-feature-store(Feature serving infra)trufflebox-ui(Admin panel / UI)infra(Docker, CI/CD, GCP/AWS setup)docs(Documentation updates)___________✅ Type of Change
___________📊 Benchmark / Metrics (if applicable)
Summary by CodeRabbit
New Features
Tests