Octopets API: High memory usage causing OutOfMemoryException and 500s on GET /api/listings/{id:int}

Incident: INC0010028 (sys_id: TBA)
Severity: Sev3
Investigation window (UTC): 2026-03-05T14:44:12Z to 2026-03-05T16:44:12Z
Target: Container Apps octopetsapi (/subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-octopets-demo-lab/providers/Microsoft.App/containerApps/octopetsapi)

Impact summary:
- ~182 failed requests (500) on GET /api/listings/{id:int} within the last 2 hours
- Sustained memory usage ~880–891 MB (~79% MemoryPercentage) during 16:20–16:41 UTC
- System.OutOfMemoryException occurrences aligned with failing requests
- 5xx request rate persisted during the memory spike; response times ~1.0–1.3s for 5xx

Evidence:
Application Insights KQL:
1) Top failing operations (last 2h)
requests
| where timestamp > ago(2h) and success == false
| summarize failures=count() by name, resultCode
| top 10 by failures desc
Results (summary):
- GET /api/listings/{id:int} 500 → 182
- GET / 404 → 5
- GET /api/listings/ 499 → 1

2) Top exceptions (last 2h)
exceptions
| where timestamp > ago(2h)
| summarize exceptions=count() by type, outerMessage
| top 10 by exceptions desc
Results (summary):
- System.OutOfMemoryException → 182 (outerMessage: "Exception of type 'System.OutOfMemoryException' was thrown.")

3) Sample failing operation (correlated)
requests
| where timestamp > ago(2h) and success == false and name has "/api/listings"
| project timestamp, name, resultCode, operation_Id, cloud_RoleName
| top 1 by timestamp desc
Sample:
- 2026-03-05T16:34:02.083Z, GET /api/listings/{id:int}, 500, operation_Id=d276a285f14c562542834892ec79fd76, role=[cae-y6uqzjyatoawm]/octopetsapi

4) Exception sample (correlated)
exceptions
| where timestamp > ago(2h)
| project timestamp, type, outerMessage, operation_Id
| top 1 by timestamp desc
Sample:
- 2026-03-05T16:34:03.103Z, System.OutOfMemoryException, operation_Id=d276a285f14c562542834892ec79fd76

5) Memory-related traces (last 2h)
traces
| where timestamp > ago(2h)
| where tostring(message) has_any ("OutOfMemory", "OOM", "memory", "heap")
| summarize entries=count() by bin(timestamp, 15m)
Results: entries at 16:15 and 16:30 UTC

Azure Metrics (Microsoft.App/containerapps):
- MemoryPercentage (last 2h): sustained ~78–79% from 16:22 to 16:41 UTC, then dropped to ~5–6% from 16:44 onward
- WorkingSetBytes: ~880–891 MB from 16:22 to 16:41 UTC, then ~98–109 MB from 16:44 onward
- CpuPercentage: ~15–28% during spike; near 0 around 16:36–16:41 and after 16:44
- Requests (5xx): average ~3 at 16:20, ~6 steady through ~16:35, then 0 after ~16:36
- ResponseTime (5xx): ~1.0–1.3s during spike window
- RestartCount: near 0; brief non-zero after 16:44 (0–1.5), suggesting a restart or scale event

Suspected root cause(s):
1) High-probability: Memory-intensive path in GET /api/listings/{id:int} leading to System.OutOfMemoryException under load. Supported by aligned OOM exceptions, high working set (~890 MB), sustained high memory percentage, and concentrated 500s on that endpoint.
2) Medium-probability: Unbounded object/materialization (e.g., deserializing large payloads, loading large related blobs/images) or inefficient DTO mapping causing large transient allocations and GC pressure.
3) Lower-probability: Misconfigured container app memory limit vs workload characteristics; minor restart/scale event observed post 16:42 reducing memory footprint.

Proposed fixes:
Code:
- Stream data for GET /api/listings/{id:int} instead of materializing full objects; use projection to minimal DTOs and avoid loading large related data eagerly.
- Add defensive guards for large payloads (size caps) and lazy-load/async-stream related resources (images/blobs).
- Review EF/ORM query includes; replace with Select projections; ensure pagination where relevant; validate that images/blob content is not embedded inline for the id path.
- Introduce memory profiling (e.g., dotnet-counters, dotnet-gcdump) in staging to identify hotspots.

IaC/config:
- Explicitly set container memory requests/limits aligned to observed peaks (e.g., allocate >1GB if justified) and add autoscale policies based on MemoryPercentage to prevent saturation.
- Add Azure Monitor alert for App Insights OutOfMemoryException count > N in 10m.
- Configure health probes/timeouts to shed load gracefully during memory pressure.

Next steps:
- Implement streaming/projection in the listings-by-id handler, add load test, and reprofile memory.
- Adjust Container Apps memory limits/requests if code-level reductions are insufficient.

Please assign to API owners for remediation. Attach relevant code paths and diffs once identified.
---
*This issue was created by sre-agent-demo--c3c0627e*
Tracked by the SRE agent [here](https://portal.azure.com/?feature.customPortal=false&feature.canmodifystamps=true&feature.fastmanifest=false&nocdn=force&websitesextension_loglevel=verbose&Microsoft_Azure_PaasServerless=beta&microsoft_azure_paasserverless_assettypeoptions=%7B%22SreAgentCustomMenu%22%3A%7B%22options%22%3A%22%22%7D%7D#view/Microsoft_Azure_PaasServerless/AgentFrameBlade.ReactView/id/%2Fsubscriptions%2F06dbbc7b-2363-4dd4-9803-95d07f1a8d3e%2FresourceGroups%2Frg-sre-agent-demo%2Fproviders%2FMicrosoft.App%2Fagents%2Fsre-agent-demo/sreLink/%2Fviews%2Factivities%2Fthreads%2Fc45b582c-2521-4325-8828-a14dafafe487)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Octopets API: High memory usage causing OutOfMemoryException and 500s on GET /api/listings/{id:int} #86

Please assign to API owners for remediation. Attach relevant code paths and diffs once identified.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Octopets API: High memory usage causing OutOfMemoryException and 500s on GET /api/listings/{id:int} #86

Description

Please assign to API owners for remediation. Attach relevant code paths and diffs once identified.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions