Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds extensive Kubernetes manifests and supporting changes to deploy microservices, observability, and infra; renames router service, introduces gateway health endpoint and routing fallback, updates load-test endpoints and auth header handling, and adjusts CORS and OTEL configuration. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Ingress as nginx-ingress
participant Router as router-service
participant Gateway as gateway-service
participant ServiceA as flights/aircraft/search...
Note right of Client: External request to api.aviation.local
Client->>Ingress: HTTP request (path /health or /service/...)
alt /health on gateway
Ingress->>Gateway: GET /health
Gateway->>Gateway: health_handler -> return {"status":"UP"}
Gateway-->>Ingress: 200 JSON
Ingress-->>Client: 200 JSON
else other paths
Ingress->>Router: forward request (hot-reload supergraph)
Router->>ServiceA: route to appropriate backend (flights/aircraft/search)
ServiceA-->>Router: response
Router-->>Ingress: response
Ingress-->>Client: response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Areas to focus review on:
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (53)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 58
🧹 Nitpick comments (31)
k8s/messaging/schema-loader.yml (1)
34-42: Consider using a pre-built image with dependencies included.Installing
curlandjqat runtime on every Job execution adds unnecessary latency and external dependencies (apk repositories). Consider building a custom image with these tools pre-installed for faster, more reliable execution.Example Dockerfile:
FROM alpine:3.20@sha256:<digest> RUN apk add --no-cache curl jq COPY script.sh /usr/local/bin/ RUN chmod +x /usr/local/bin/script.shk8s/messaging/kafka.yml (2)
39-39: Pin the Kafka image by digest for reproducible deployments.Using
apache/kafka:4.0.0without a digest allows the image content to change if the tag is re-pushed, potentially introducing unexpected behaviour or security vulnerabilities.- image: apache/kafka:4.0.0 + image: apache/kafka:4.0.0@sha256:<digest>Run
docker pull apache/kafka:4.0.0anddocker inspectto obtain the current digest, or usekubectlto retrieve it from your registry.
52-53: Document the hardcoded Kafka CLUSTER_ID.The CLUSTER_ID
7sWhx1QdSqGGl3a57SSZXAappears to be hardcoded. In KRaft mode, this ID must be consistent across all brokers but should be documented or managed through configuration to prevent accidental changes during cluster recreation.Consider adding a comment explaining the ID's origin and whether it should remain fixed:
# Cluster ID - must remain consistent for this Kafka cluster # Generated with: kafka-storage.sh random-uuid - name: CLUSTER_ID value: "7sWhx1QdSqGGl3a57SSZXA"k8s/aircraft/database.yml (2)
44-48: Consider adding a liveness probe for PostgreSQL.Whilst the readiness probe correctly checks database availability, adding a liveness probe would enable Kubernetes to restart the container if PostgreSQL becomes unresponsive (e.g., deadlocked).
readinessProbe: exec: command: [ "pg_isready","-U","postgres","-d","aircraft" ] initialDelaySeconds: 10 periodSeconds: 5 + livenessProbe: + exec: + command: [ "pg_isready","-U","postgres","-d","aircraft" ] + initialDelaySeconds: 30 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3
31-31: Pin the PostgreSQL image by digest.Using
postgres:17.6without a digest allows image content changes if the tag is updated.- image: postgres:17.6 + image: postgres:17.6@sha256:<digest>k8s/flights/README.md (1)
8-8: Consider omitting or referencing secrets for credentials.Whilst this is documentation, explicitly stating credentials as "postgres/postgres" could encourage weak password usage. Consider referencing that credentials are managed via Kubernetes Secrets instead of documenting the actual values.
-**Credentials**: postgres/postgres +**Credentials**: Managed via Kubernetes Secrets (see `secrets.yml`)load-tests/src/tests/shells/flights.ts (1)
27-28: Consider more explicit header strategy configuration.Whilst the URL-based detection works, using
!url.includes("/gateway")is fragile and couples the authentication strategy to URL patterns. If URL structures change or additional routing patterns are introduced, this logic could break silently.Consider passing
useDirectHeadersas an explicit parameter torunFlightScenarioor deriving it from a configuration object rather than inferring it from the URL string:export function runFlightScenario( url: string, aircraftId: string, accessToken?: string, authContext?: AuthContext, useDirectHeaders: boolean = true, // Explicit parameter ) { // ... }load-tests/src/tests/shells/aircraft.ts (1)
21-22: Consider more explicit header strategy configuration.This has the same URL-based detection fragility as in
flights.ts. Using!url.includes("/gateway")couples the authentication strategy to URL patterns, which could break if URL structures evolve.Consider passing
useDirectHeadersas an explicit parameter:export function runAircraftScenario( url: string, accessToken?: string, authContext?: AuthContext, useDirectHeaders: boolean = true, // Explicit parameter ) { // ... }k8s/monitoring/jaeger.yml (1)
24-55: Add security context to harden container posture.Checkov has flagged that this deployment allows privilege escalation and permits root container admission. For production deployments, add a security context to enforce non-root execution and prevent privilege escalation.
Apply this diff to add security hardening:
spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 containers: - name: jaeger image: jaegertracing/all-in-one:1.65.0 + securityContext: + allowPrivilegeEscalation: false + readOnlyRootFilesystem: false + capabilities: + drop: + - ALLNote: Verify that the Jaeger image supports running as a non-root user, as some container images may require root privileges.
k8s/aircraft/README.md (1)
1-29: Update documentation to highlight security considerations.The README documents placeholder credentials (postgres/postgres) and does not mention the use of Kubernetes Secrets for sensitive data. Add a note clarifying that credentials should be managed via Secrets in production.
Consider adding a Security section:
#### Security - **Database Credentials**: Use Kubernetes Secrets to manage database passwords in production - **Image Registry**: Ensure container images are sourced from trusted registries - **Pod Security**: Apply Pod Security Standards appropriate for your environmentk8s/aircraft/service.yml (1)
20-98: Add security context to enforce non-root execution and prevent privilege escalation.Similar to other deployments in this PR, this deployment lacks a security context. For production workloads, enforce non-root execution.
Apply this diff to add security hardening:
spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 initContainers:k8s/search/dashboards.yml (2)
16-67: Add security context to enforce non-root execution and prevent privilege escalation.This deployment lacks a security context. Enforce non-root execution for the dashboards container.
spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 containers:
60-67: Missing newline at end of file.YAML files should end with a newline character. Add one after the final
---marker.k8s/aircraft/cache.yml (1)
12-43: Add security context to enforce non-root execution and prevent privilege escalation.This deployment lacks a security context. Add proper security settings for the Redis container.
spec: + securityContext: + fsGroup: 999 containers: - name: redis image: redis:8.2 + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 999 + capabilities: + drop: + - ALL ports:k8s/monitoring/promethus.yml (2)
15-65: Add security context to enforce non-root execution and prevent privilege escalation.This deployment lacks security hardening. Add a security context to prevent root execution.
spec: + securityContext: + runAsNonRoot: true + runAsUser: 65534 + fsGroup: 65534 initContainers: - name: fetch-config image: alpine/git:2.49.1 + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL command:
36-42: Add error handling to the shell script in initContainer.The shell script lacks error handling. If any step fails (git clone, mkdir, cp), the container will still exit with success, potentially causing Prometheus to start without a valid configuration.
Apply this diff to add error handling:
command: - sh - -c - | + set -e echo "Cloning configuration..." git clone --branch main https://github.com/edinstance/distributed-aviation-system.git /repo echo "Copying prometheus config files..." mkdir -p /config cp /repo/monitoring/prometheus/prometheus.yml /config/config.yaml + echo "Verifying config file..." + test -f /config/config.yaml || exit 1 echo "Config ready."k8s/03-ingress.yml (1)
10-103: Consider consolidating duplicate path routing to reduce configuration verbosity.Each service has both a regex pattern (
/service/(.*)) and a prefix pattern (/service). The prefix pattern alone would suffice, as nginx Ingress will route/service/anythingvia the Prefix type. The regex patterns add configuration overhead without clear benefit.Consider removing the regex patterns and keeping only Prefix routing:
paths: - - path: /router/(.*) - pathType: ImplementationSpecific - backend: - service: - name: router-service - port: - number: 4000 - path: /router pathType: Prefix backend: service: name: router-service port: number: 4000Repeat this simplification for all six services (flights, aircraft, auth, gateway, search).
k8s/messaging/schema-registry.yml (1)
14-58: Add health checks and security context to the deployment.The deployment lacks readiness and liveness probes, which are important for orchestration and reliability. Additionally, no security context is defined.
Apply this diff to add health checks and security hardening:
- name: schema-registry image: confluentinc/cp-schema-registry:8.1.0 + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + runAsUser: 1000 + capabilities: + drop: + - ALL ports: - containerPort: 8081 env: - name: SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS value: "kafka.messaging.svc.cluster.local:9092" - name: SCHEMA_REGISTRY_LISTENERS value: "http://0.0.0.0:8081" - name: SCHEMA_REGISTRY_HOST_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: SCHEMA_REGISTRY_LOG4J_ROOT_LOGLEVEL value: "INFO" - name: SCHEMA_REGISTRY_KAFKASTORE_TOPIC_REPLICATION_FACTOR value: "1" - name: SCHEMA_REGISTRY_KAFKASTORE_SECURITY_PROTOCOL value: "PLAINTEXT" - name: SCHEMA_REGISTRY_SCHEMA_COMPATIBILITY_LEVEL value: "BACKWARD" + readinessProbe: + httpGet: + path: /subjects + port: 8081 + initialDelaySeconds: 30 + periodSeconds: 10 + livenessProbe: + httpGet: + path: /subjects + port: 8081 + initialDelaySeconds: 60 + periodSeconds: 30 resources: requests: cpu: 250m memory: 256Mi limits: cpu: 500m memory: 512Mik8s/search/service.yml (1)
66-71: Consider adding a liveness probe.Whilst a readiness probe is defined, there's no liveness probe. Adding one enables Kubernetes to automatically restart pods that become unresponsive.
Add a liveness probe after the readiness probe:
readinessProbe: httpGet: path: /details/health port: 8082 initialDelaySeconds: 10 periodSeconds: 10 + livenessProbe: + httpGet: + path: /details/health + port: 8082 + initialDelaySeconds: 30 + periodSeconds: 30 + timeoutSeconds: 5 resources:k8s/monitoring/mimir.yml (1)
63-65: Consider using ConfigMap for configuration.Whilst the current approach of fetching configuration via an initContainer works, using a Kubernetes ConfigMap would be more idiomatic, eliminate the Git dependency at runtime, and allow faster pod startup. This also provides better configuration versioning through Kubernetes-native mechanisms.
k8s/search/opensearch.yml (1)
47-50: Cluster configuration is functional but consider HA improvements.The current configuration uses a single initial cluster manager node (
opensearch-0). For improved resilience in production, consider configuring both nodes as eligible cluster managers.- name: cluster.initial_cluster_manager_nodes - value: "opensearch-0" + value: "opensearch-0,opensearch-1"k8s/router/service.yml (1)
57-62: Consider adding a liveness probe.Whilst a readiness probe is configured, adding a liveness probe would enable Kubernetes to automatically restart the router if it becomes unresponsive.
readinessProbe: httpGet: path: /health port: 8088 initialDelaySeconds: 10 periodSeconds: 10 + livenessProbe: + httpGet: + path: /health + port: 8088 + initialDelaySeconds: 30 + periodSeconds: 30 + timeoutSeconds: 5 resources:k8s/gateway/service.yml (1)
52-57: Consider adding a liveness probe.Adding a liveness probe alongside the readiness probe enables Kubernetes to automatically restart the gateway if it becomes unresponsive.
readinessProbe: httpGet: path: /health port: 4001 initialDelaySeconds: 10 periodSeconds: 10 + livenessProbe: + httpGet: + path: /health + port: 4001 + initialDelaySeconds: 30 + periodSeconds: 30 + timeoutSeconds: 5 resources:k8s/authentication/database.yml (1)
44-48: Consider adding a liveness probe.Whilst a readiness probe is configured, adding a liveness probe would enable Kubernetes to automatically restart the database if it becomes unresponsive.
readinessProbe: exec: command: [ "pg_isready","-U","postgres","-d","authentication" ] initialDelaySeconds: 10 periodSeconds: 5 + livenessProbe: + exec: + command: [ "pg_isready","-U","postgres","-d","authentication" ] + initialDelaySeconds: 30 + periodSeconds: 30 + timeoutSeconds: 5 volumeMounts:k8s/flights/database.yml (1)
44-48: Add livenessProbe to detect and restart hung PostgreSQL instances.The readinessProbe ensures the pod is ready to serve traffic, but a livenessProbe is needed to restart the pod if PostgreSQL hangs or becomes unresponsive.
Apply this diff to add a livenessProbe:
readinessProbe: exec: command: [ "pg_isready","-U","postgres","-d","flights" ] initialDelaySeconds: 10 periodSeconds: 5 + livenessProbe: + exec: + command: [ "pg_isready","-U","postgres","-d","flights" ] + initialDelaySeconds: 30 + periodSeconds: 10 + failureThreshold: 3 volumeMounts:k8s/aircraft/migrations.yml (1)
36-51: Consider extracting migration Job into a shared Helm chart or kustomize template.The aircraft migrations Job closely mirrors k8s/authentication/migrations.yml. Both use identical patterns (wait-for-db, fetch-repo, liquibase update) with only namespace and changelog path varying. This violates the DRY principle and increases maintenance burden.
Consolidate the migration pattern into a reusable Kustomize component or Helm chart with parameterized namespace, database hostname, and changelog path. This reduces duplication across flights, aircraft, authentication, and any future services requiring database migrations.
k8s/monitoring/otel.yml (1)
34-34: Single-replica OTEL collector introduces single point of failure for observability.Monitoring infrastructure should be resilient. A single replica means any pod eviction or restart will interrupt telemetry collection.
Increase replicas to at least 2 and consider adding pod anti-affinity to spread across nodes:
replicas: 1 + replicas: 2 selector: matchLabels: app: otel-collector template: metadata: labels: app: otel-collector spec: + affinity: + podAntiAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 100 + podAffinityTerm: + labelSelector: + matchExpressions: + - key: app + operator: In + values: + - otel-collector + topologyKey: kubernetes.io/hostnamek8s/flights/service.yml (1)
52-60: Pin busybox image tags and consider more specific/smaller init containers.The init containers use unversioned
busyboximages (lines 32, 35), which will pull the latest tag by default. This introduces non-determinism and potential compatibility issues.Pin busybox to a specific version:
- name: wait-for-db - image: busybox + image: busybox:1.36 command: [ "sh", "-c", "until nc -z flights-database.flights.svc.cluster.local 5432; do sleep 2; done;" ] - name: wait-for-cache - image: busybox + image: busybox:1.36 command: [ "sh", "-c", "until nc -z flights-cache.flights.svc.cluster.local 6379; do sleep 2; done;" ]k8s/monitoring/grafana.yml (1)
36-42: Consider pinning the Git commit for reproducible deployments.The init container clones from the
mainbranch without verifying the commit hash, which could introduce unexpected configuration changes or malicious content between deployments.Consider pinning to a specific commit:
- | echo "Cloning configuration..." - git clone --branch main https://github.com/edinstance/distributed-aviation-system.git /repo + git clone https://github.com/edinstance/distributed-aviation-system.git /repo + cd /repo + git checkout <specific-commit-hash> + cd / echo "Copying grafana config files..."Alternatively, consider using a ConfigMap to store provisioning configuration directly in the cluster for better auditability and version control.
k8s/01-ingress-controller.yml (1)
510-618: Consider setting a non-zero TTL for admission Jobs.Both admission Jobs have
ttlSecondsAfterFinished: 0, causing immediate deletion after completion. This makes debugging certificate generation failures difficult, as logs and Job status are lost.Apply this diff to retain Jobs for 24 hours:
serviceAccountName: ingress-nginx-admission - ttlSecondsAfterFinished: 0 + ttlSecondsAfterFinished: 86400 ---(Apply to both Job definitions at lines 562 and 618.)
k8s/authentication/service.yml (1)
48-58: Replace sleep-based wait with proper Job status check.The
wait-for-migrationsinit container uses a fixed 10-second sleep without verifying that migrations actually completed. If migrations take longer or fail, the service may start with an incomplete schema.Consider using a Kubernetes Job status check via the API or implement a readiness endpoint on the migrations Job that the init container can poll:
- name: wait-for-migrations image: bitnami/kubectl:latest securityContext: # ... add securityContext as noted above command: - sh - -c - | echo "Waiting for migrations Job to complete..." kubectl wait --for=condition=complete --timeout=300s job/authentication-migrations -n authentication echo "Migrations complete!"This requires granting the ServiceAccount permissions to read Job status in the authentication namespace.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (53)
docker-compose.yml(1 hunks)k8s/00-namespace.yml(1 hunks)k8s/01-ingress-controller.yml(1 hunks)k8s/02-cross-namespace-services.yml(1 hunks)k8s/03-ingress.yml(1 hunks)k8s/README.md(1 hunks)k8s/aircraft/README.md(1 hunks)k8s/aircraft/cache.yml(1 hunks)k8s/aircraft/database.yml(1 hunks)k8s/aircraft/migrations.yml(1 hunks)k8s/aircraft/secrets.yml(1 hunks)k8s/aircraft/service.yml(1 hunks)k8s/authentication/database.yml(1 hunks)k8s/authentication/keygen.yml(1 hunks)k8s/authentication/migrations.yml(1 hunks)k8s/authentication/secrets.yml(1 hunks)k8s/authentication/service.yml(1 hunks)k8s/cross-namespace-services-aviation.yml(1 hunks)k8s/flights/README.md(1 hunks)k8s/flights/cache.yml(1 hunks)k8s/flights/database.yml(1 hunks)k8s/flights/migrations.yml(1 hunks)k8s/flights/secrets.yml(1 hunks)k8s/flights/service.yml(1 hunks)k8s/gateway/service.yml(1 hunks)k8s/messaging/kafka.yml(1 hunks)k8s/messaging/schema-loader.yml(1 hunks)k8s/messaging/schema-registry.yml(1 hunks)k8s/monitoring/grafana.yml(1 hunks)k8s/monitoring/jaeger.yml(1 hunks)k8s/monitoring/loki.yml(1 hunks)k8s/monitoring/mimir.yml(1 hunks)k8s/monitoring/otel.yml(1 hunks)k8s/monitoring/promethus.yml(1 hunks)k8s/router/router-config-job.yml(1 hunks)k8s/router/service.yml(1 hunks)k8s/router/supergraph-job.yml(1 hunks)k8s/search/dashboards.yml(1 hunks)k8s/search/init-job.yml(1 hunks)k8s/search/opensearch.yml(1 hunks)k8s/search/service.yml(1 hunks)load-tests/src/config.ts(1 hunks)load-tests/src/helpers/auth_headers.ts(1 hunks)load-tests/src/tests/shells/aircraft.ts(3 hunks)load-tests/src/tests/shells/flights.ts(4 hunks)monitoring/docker-compose.yml(0 hunks)router.yml(1 hunks)services/authentication/authentication/settings.py(2 hunks)services/authentication/docker-compose.yml(1 hunks)services/gateway/src/handling/health_handler.rs(1 hunks)services/gateway/src/handling/mod.rs(1 hunks)services/gateway/src/main.rs(2 hunks)supergraph.yml(1 hunks)
💤 Files with no reviewable changes (1)
- monitoring/docker-compose.yml
🧰 Additional context used
🧬 Code graph analysis (5)
services/gateway/src/handling/mod.rs (1)
services/gateway/src/handling/health_handler.rs (1)
health_handler(4-6)
load-tests/src/tests/shells/flights.ts (2)
load-tests/src/helpers/graphql_request.ts (1)
graphql(6-51)load-tests/src/helpers/auth_headers.ts (1)
buildAuthHeaders(9-29)
load-tests/src/helpers/auth_headers.ts (1)
load-tests/src/types/auth_context.ts (1)
AuthContext(1-13)
load-tests/src/tests/shells/aircraft.ts (3)
load-tests/src/types/auth_context.ts (1)
AuthContext(1-13)load-tests/src/helpers/graphql_request.ts (1)
graphql(6-51)load-tests/src/helpers/auth_headers.ts (1)
buildAuthHeaders(9-29)
services/gateway/src/main.rs (2)
services/gateway/src/handling/route_handler.rs (1)
route_handler(22-150)services/gateway/src/handling/health_handler.rs (1)
health_handler(4-6)
🪛 Checkov (3.2.334)
k8s/messaging/schema-registry.yml
[medium] 14-58: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 14-58: Minimize the admission of root containers
(CKV_K8S_23)
k8s/monitoring/jaeger.yml
[medium] 24-55: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 24-55: Minimize the admission of root containers
(CKV_K8S_23)
k8s/search/dashboards.yml
[medium] 16-67: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 16-67: Minimize the admission of root containers
(CKV_K8S_23)
k8s/aircraft/database.yml
[medium] 13-66: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 13-66: Minimize the admission of root containers
(CKV_K8S_23)
k8s/router/service.yml
[medium] 24-83: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 24-83: Minimize the admission of root containers
(CKV_K8S_23)
k8s/messaging/kafka.yml
[medium] 21-118: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 21-118: Minimize the admission of root containers
(CKV_K8S_23)
k8s/authentication/migrations.yml
[medium] 1-77: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-77: Minimize the admission of root containers
(CKV_K8S_23)
k8s/search/service.yml
[medium] 15-78: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-78: Minimize the admission of root containers
(CKV_K8S_23)
k8s/flights/cache.yml
[medium] 12-43: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 12-43: Minimize the admission of root containers
(CKV_K8S_23)
k8s/authentication/keygen.yml
[medium] 1-51: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-51: Minimize the admission of root containers
(CKV_K8S_23)
k8s/messaging/schema-loader.yml
[medium] 1-54: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-54: Minimize the admission of root containers
(CKV_K8S_23)
k8s/gateway/service.yml
[medium] 15-64: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-64: Minimize the admission of root containers
(CKV_K8S_23)
k8s/router/supergraph-job.yml
[medium] 23-72: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 23-72: Minimize the admission of root containers
(CKV_K8S_23)
k8s/01-ingress-controller.yml
[high] 218-236: Minimize ClusterRoles that grant control over validating or mutating admission webhook configurations
(CKV_K8S_155)
[high] 257-276: No ServiceAccount/Node should be able to read all secrets
(CKV2_K8S_5)
k8s/search/opensearch.yml
[medium] 18-98: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 18-98: Minimize the admission of root containers
(CKV_K8S_23)
k8s/authentication/service.yml
[medium] 15-144: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-144: Minimize the admission of root containers
(CKV_K8S_23)
k8s/aircraft/migrations.yml
[medium] 1-71: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-71: Minimize the admission of root containers
(CKV_K8S_23)
k8s/aircraft/service.yml
[medium] 20-98: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 20-98: Minimize the admission of root containers
(CKV_K8S_23)
k8s/router/router-config-job.yml
[medium] 1-42: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-42: Minimize the admission of root containers
(CKV_K8S_23)
k8s/search/init-job.yml
[medium] 1-63: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-63: Minimize the admission of root containers
(CKV_K8S_23)
k8s/flights/migrations.yml
[medium] 1-51: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-51: Minimize the admission of root containers
(CKV_K8S_23)
k8s/authentication/database.yml
[medium] 13-66: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 13-66: Minimize the admission of root containers
(CKV_K8S_23)
k8s/monitoring/otel.yml
[medium] 28-81: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 28-81: Minimize the admission of root containers
(CKV_K8S_23)
k8s/monitoring/loki.yml
[medium] 15-41: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-41: Minimize the admission of root containers
(CKV_K8S_23)
k8s/monitoring/promethus.yml
[medium] 15-65: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-65: Minimize the admission of root containers
(CKV_K8S_23)
k8s/aircraft/cache.yml
[medium] 12-43: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 12-43: Minimize the admission of root containers
(CKV_K8S_23)
k8s/monitoring/mimir.yml
[medium] 15-65: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-65: Minimize the admission of root containers
(CKV_K8S_23)
k8s/monitoring/grafana.yml
[medium] 15-73: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-73: Minimize the admission of root containers
(CKV_K8S_23)
k8s/flights/service.yml
[medium] 15-73: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 15-73: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 44-45: Basic Auth Credentials
(CKV_SECRET_4)
k8s/flights/database.yml
[medium] 13-66: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 13-66: Minimize the admission of root containers
(CKV_K8S_23)
🪛 Gitleaks (8.28.0)
k8s/authentication/secrets.yml
[high] 2-8: Possible Kubernetes Secret detected, posing a risk of leaking credentials/tokens from your deployments
(kubernetes-secret-yaml)
[high] 11-17: Possible Kubernetes Secret detected, posing a risk of leaking credentials/tokens from your deployments
(kubernetes-secret-yaml)
k8s/flights/secrets.yml
[high] 2-8: Possible Kubernetes Secret detected, posing a risk of leaking credentials/tokens from your deployments
(kubernetes-secret-yaml)
k8s/aircraft/secrets.yml
[high] 2-8: Possible Kubernetes Secret detected, posing a risk of leaking credentials/tokens from your deployments
(kubernetes-secret-yaml)
🔇 Additional comments (26)
services/gateway/src/handling/mod.rs (1)
1-5: LGTM!The module structure is clean and correctly exports both handlers.
services/authentication/docker-compose.yml (1)
10-19: LGTM! Improved CORS security configuration.The changes properly restrict CORS to specific origins whilst allowing credentials, which is a security improvement over the previous
CORS_ALLOW_ALL_ORIGINS: Trueconfiguration. The addition ofhost.docker.internalsupports Docker Desktop local development.services/gateway/src/handling/health_handler.rs (1)
1-6: LGTM!The health check endpoint implementation is clean and follows best practices for Kubernetes readiness/liveness probes.
services/gateway/src/main.rs (1)
8-8: LGTM! Clean separation of health and application routing.The introduction of a dedicated
/healthendpoint alongside a fallback handler for all other routes is a good pattern for Kubernetes deployments. The Arc clones in the closure are efficient and appropriate.Also applies to: 46-55
k8s/authentication/keygen.yml (1)
7-7: Verify: Zero backoff limit prevents retries on transient failures.Setting
backoffLimit: 0means the Job will not retry if it fails due to transient issues (network errors, temporary Git unavailability, etc.). For critical key generation, this might leave the system in a broken state.Is this intentional? If key generation must succeed exactly once (idempotency concerns), consider adding retry logic with idempotency checks in the script itself rather than disabling Kubernetes retries entirely.
k8s/messaging/kafka.yml (1)
109-118: Verify storage class availability and durability requirements.The StatefulSet uses
storageClassName: local-path, which typically provides node-local storage without replication. If a node fails, data on that node is lost. For a 3-replica Kafka cluster withmin.insync.replicas=2, this may be acceptable, but ensure this aligns with your durability requirements.Confirm that:
- The
local-pathstorage class exists in your cluster- Node-local storage is acceptable for your use case
- Kafka's replication factor (3) provides sufficient durability despite non-replicated storage
services/authentication/authentication/settings.py (2)
212-217: Document the custom CORS headers and verify necessity.The
CORS_ALLOWED_HEADERSincludesx-tenant-id, which appears to be custom. Ensure this header is actually required by your frontend clients and that its inclusion doesn't inadvertently expose tenant information in insecure contexts.Confirm:
- The
x-tenant-idheader is required for your multi-tenant architecture- Clients properly set this header
- The backend validates tenant isolation based on this header
212-225: CORS credentials with environment-controlled origins requires careful production configuration.Setting
CORS_ALLOW_CREDENTIALS=Truewhilst allowing credentials requires strict origin controls. Ensure theCORS_ALLOWED_ORIGINSenvironment variable is properly configured in production to prevent credential leakage to unauthorised origins.Verify that:
CORS_ALLOW_ALL_ORIGINSis set toFalsein production (via environment)CORS_ALLOWED_ORIGINScontains only trusted frontend origins- The combination of credentials + origins is tested to prevent security gaps
k8s/cross-namespace-services-aviation.yml (1)
1-32: Note: Port field in ExternalName services is informational only.For
type: ExternalNameservices, theportsfield does not affect traffic routing—clients must connect directly to the port specified in theexternalNameDNS entry. The port declarations here serve only as documentation.This is acceptable for documentation purposes, but be aware that changing these port values won't affect actual connectivity. Consider adding comments to clarify this:
spec: type: ExternalName externalName: flights-service.flights.svc.cluster.local ports: - port: 8081 # Documentation only - ExternalName services do not route traffick8s/00-namespace.yml (1)
1-43: LGTM! Clean namespace structure.The namespace declarations are well-organised and follow Kubernetes conventions. The ingress-nginx namespace correctly includes the standard labels for ingress controller installations.
supergraph.yml (1)
4-12: Excellent use of environment variables for service URLs.The migration to environment-variable-backed routing URLs with sensible fallbacks improves deployment flexibility across different environments whilst maintaining backwards compatibility.
k8s/02-cross-namespace-services.yml (1)
1-65: ExternalName services configured correctly.The ExternalName service definitions properly enable cross-namespace routing by mapping service names in the ingress-nginx namespace to their actual locations in service-specific namespaces. The fully qualified domain names follow the correct pattern:
<service>.<namespace>.svc.cluster.local.Note: ExternalName services use the
portfield primarily for documentation; the actual port resolution happens at the target service.k8s/flights/README.md (1)
1-29: Clear and comprehensive documentation.The README effectively documents the service stack components, their purposes, and interdependencies. This will be valuable for operators and developers working with the deployment.
load-tests/src/tests/shells/flights.ts (1)
13-13: Excellent refactoring to centralize header construction.The introduction of
buildAuthHeaderseliminates code duplication and ensures consistent authentication header handling across all GraphQL requests.load-tests/src/tests/shells/aircraft.ts (1)
14-14: Excellent refactoring to centralize header construction.The adoption of
buildAuthHeadersmaintains consistency withflights.tsand eliminates code duplication across the test suite.k8s/messaging/schema-registry.yml (1)
46-48: Verify that replication factor of 1 is appropriate for your environment.The Schema Registry uses
SCHEMA_REGISTRY_KAFKASTORE_TOPIC_REPLICATION_FACTOR: 1, which means schema metadata is stored in a Kafka topic with no replicas. This is suitable for development but poses data loss risk in production. For production deployments, increase this to at least 2 or 3.If this is a production cluster, update the replication factor:
- name: SCHEMA_REGISTRY_KAFKASTORE_TOPIC_REPLICATION_FACTOR - value: "1" + value: "3"k8s/monitoring/promethus.yml (1)
31-45: Repository path and configuration file verified—add error handling and confirm cluster network access.The repository path and Prometheus configuration file have been verified to exist and contain valid YAML. However:
- Add error handling to the init script to fail fast if
git cloneor file copy operations fail—currently, if the clone fails, the script continues and Prometheus starts without a valid configuration- Ensure the repository is accessible from your Kubernetes cluster (may require network policies, ingress configuration, or proxy settings to be configured)
- Consider storing the configuration in a ConfigMap instead of cloning at runtime to avoid network dependency on each pod startup
k8s/aircraft/service.yml (1)
35-41: No action required — standard busybox images include netcat.The official busybox Docker image includes the
ncutility by default, and the init container commands will function correctly. Whilst netcat is technically optional in BusyBox compilation (dependent on CONFIG_NC flags), the standard busybox image used here includes it. This port-checking pattern is widely established in Kubernetes deployments.load-tests/src/helpers/auth_headers.ts (1)
1-29: LGTM!The
buildAuthHeadersutility function is well-implemented with clear documentation. The conditional logic correctly handles both JWT-based gateway authentication and direct service authentication headers. The function signature with optional parameters and default values provides good flexibility for different test scenarios.k8s/router/service.yml (1)
77-83: Verify PVC initialisation before deployment.The router depends on two PVCs (
supergraph-schema-pvcandrouter-config-pvc) being created and populated before the deployment starts. Ensure the jobs that populate these PVCs (referenced in the PR summary) complete successfully before applying this deployment, or consider adding init containers to wait for the required files.k8s/flights/database.yml (1)
20-20: Verify single-replica StatefulSet appropriateness for production use.This manifest deploys only 1 PostgreSQL replica, providing no redundancy. For development, this is acceptable; for production, ensure this is intentional or plan for replication strategy (e.g., PostgreSQL streaming replication or managed database service).
Confirm whether single-replica PostgreSQL is appropriate for your deployment strategy, or plan for a high-availability setup with multiple replicas and failover logic.
k8s/flights/service.yml (1)
51-60: Verify KAFKA_FLIGHTS_TOPIC and other hardcoded service URLs for correctness.All external service URLs are hardcoded (AIRCRAFT_SERVICE_GRPC_URL, OTLP_GRPC_URL, KAFKA_BROKER_URL, KAFKA_SCHEMA_REGISTRY_URL). Whilst Kubernetes DNS conventions support this, any future namespace reorganisation or service relocation will require manual pod spec updates.
Confirm these hardcoded URLs align with your Kubernetes topology:
- aircraft-service.aircraft:9090
- otel-collector.monitoring:4317
- kafka.messaging.svc.cluster.local:9092
- schema-registry.messaging.svc.cluster.local:8081
If service namespaces or names change frequently, consider using a ConfigMap or external service discovery to inject these values dynamically.
k8s/monitoring/grafana.yml (1)
1-13: LGTM!The Service definition is correctly configured for internal cluster access to Grafana on port 3000.
k8s/01-ingress-controller.yml (1)
383-509: LGTM!The ingress-nginx controller Deployment follows security best practices with a properly configured
securityContext, running as non-root, dropping all capabilities exceptNET_BIND_SERVICE, and defining appropriate resource limits.k8s/authentication/service.yml (1)
1-13: LGTM!The Service definition correctly exposes the authentication service on port 8000 within the authentication namespace.
load-tests/src/config.ts (1)
1-7: Verification confirmed—endpoint paths are correctly configured.The ingress rules in
k8s/03-ingress.ymldefine both/routerand/router/(.*)paths routing to router-service:4000, and/gatewayand/gateway/(.*)paths routing to gateway-service:4001. The configuration inload-tests/src/config.tscorrectly uses these paths:http://api.aviation.local/routerandhttp://api.aviation.local/gateway. The migration to the Kubernetes ingress configuration is consistent.
Summary by CodeRabbit
New Features
Infrastructure & Deployment