Sev3: High memory usage and OOMs in Octopets API (GET /api/listings/{id})

Incident: INC0010025 (ServiceNow)
Severity: 3
Investigation window (UTC): 2026-03-05T14:36:00Z → 2026-03-05T16:36:00Z
Target: Container App rg-octopets-demo-lab/octopetsapi

Impact summary:
- 182 failed requests (500) on GET /api/listings/{id:int} within last 2h
- Exceptions dominated by System.OutOfMemoryException (182)
- Container memory sustained ~79–89% around 16:22–16:33Z, aligned with alert firing at 16:31Z
- 5xx throughput ~6/min during peak window; response time elevated ~780–860 ms

Application Insights evidence:
KQL 1 (Top failing operations):
requests
| where timestamp > ago(2h)
| summarize total=count(), failures=sumif(1, success==false) by name, resultCode
| where failures > 0
| top 10 by failures desc
Results (last 2h):
- GET /api/listings/{id:int}, 500: total=182, failures=182
- GET /, 404: total=4, failures=4
- GET /api/listings/, 499: total=1, failures=1

KQL 2 (Top exception types):
exceptions
| where timestamp > ago(2h)
| summarize count() by type
| top 10 by count_ desc
Results:
- System.OutOfMemoryException: 182

KQL 3 (Sample failing request):
requests
| where timestamp > ago(2h) and success == false
| project timestamp, operation_Id, name, resultCode
| top 1 by timestamp desc
Sample:
- 2026-03-05T16:34:02Z, operation_Id=d276a285f14c562542834892ec79fd76, name=GET /api/listings/{id:int}, resultCode=500

Azure Metrics (octopetsapi):
- MemoryPercentage (avg): sustained 74–79% from ~16:20Z–16:33Z with spikes up to ~79–80%; dips after 16:34Z
- WorkingSetBytes (avg): ~859–889 MB between 16:20Z–16:33Z (alert threshold 858,993,459 bytes ≈ 0.8 GB)
- CpuPercentage (avg): ~16–28% during same period
- Requests[statusCodeCategory=5xx] (avg/min): ~3–7.5 from 16:20Z–16:34Z
- ResponseTime (avg ms): ~780–860 ms from 16:20Z–16:33Z
- RestartCount: no recent increments observed in last 2h

Suspected root cause (ranked):
1) Memory leak or unbounded object allocation in GET /api/listings/{id:int} code path → corroborated by 182 System.OutOfMemoryException and coincident high WorkingSetBytes.
2) Payload amplification or inefficient serialization (e.g., loading full related entities/images into memory) causing large transient allocations per request → aligns with elevated response time and 5xx bursts without CPU saturation.
3) Insufficient container memory limit relative to request workload profile → memory hovering just above the alert threshold suggests headroom is tight.

Proposed fixes:
Code (assuming .NET API):
- Stream responses and use pagination; avoid materializing large collections.
- Ensure IDisposable patterns are followed for streams/db contexts; avoid ToList()/Include() on large graphs.
- Cap result sizes (max page size) for GET /api/listings/{id}; lazy-load large fields (e.g., images) via separate endpoints.
- Add guards and fallbacks to return 429/503 instead of OOM on pressure; instrument GC counters.

IaC/config (Container Apps):
- Raise memory limit or add autoscaling based on MemoryPercentage. Example bicep snippet tweak:
  // external/octopets/apphost/infra/main.bicep (container resources section)
  container: {
    image: '<same>'
    resources: {
      cpu: 0.5
      memory: '1.5Gi' // was '1Gi'
    }
    env: [
      // consider ASPNETCORE_URLS, GC HeapHardLimitPercent if needed
    ]
  }
- Configure scale rules to add replicas when MemoryPercentage > 70% sustained 5m.
- Add Azure Monitor alerts for 5xx rate and ResponseTime alongside memory.

Next steps:
- Reproduce locally/dev with representative payloads; collect dotnet-counters (GC HeapSize, LOH size) under load.
- Implement streaming/pagination; add load test to validate memory stays <65% at p95.
- Submit PR to update bicep with memory and scale adjustments.

References:
- Alert fired: 2026-03-05T16:31:28Z (High Memory Usage - Octopets API)
- Correlation sample: operation_Id d276a285f14c562542834892ec79fd76 (16:34:02Z)

---
*This issue was created by sre-agent-demo--c3c0627e*
Tracked by the SRE agent [here](https://portal.azure.com/?feature.customPortal=false&feature.canmodifystamps=true&feature.fastmanifest=false&nocdn=force&websitesextension_loglevel=verbose&Microsoft_Azure_PaasServerless=beta&microsoft_azure_paasserverless_assettypeoptions=%7B%22SreAgentCustomMenu%22%3A%7B%22options%22%3A%22%22%7D%7D#view/Microsoft_Azure_PaasServerless/AgentFrameBlade.ReactView/id/%2Fsubscriptions%2F06dbbc7b-2363-4dd4-9803-95d07f1a8d3e%2FresourceGroups%2Frg-sre-agent-demo%2Fproviders%2FMicrosoft.App%2Fagents%2Fsre-agent-demo/sreLink/%2Fviews%2Factivities%2Fthreads%2Fab06d9db-d708-4817-bb4c-1051f665fb14)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sev3: High memory usage and OOMs in Octopets API (GET /api/listings/{id}) #82

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Sev3: High memory usage and OOMs in Octopets API (GET /api/listings/{id}) #82

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions