Skip to content

Fix OOM in Octopets API by reducing test memory allocation from 1GB to 100MB#2

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/fix-high-memory-usage
Closed

Fix OOM in Octopets API by reducing test memory allocation from 1GB to 100MB#2
Copilot wants to merge 1 commit intomainfrom
copilot/fix-high-memory-usage

Conversation

Copy link

Copilot AI commented Dec 17, 2025

Incident INC0010007 reported System.OutOfMemoryException in ListingEndpoints.AReallyExpensiveOperation() when ERRORS=true. The method allocated ~1GB (10×100MB) in a container with 1Gi limit, causing sustained memory plateau at 852-855MB followed by OOM crashes.

Changes

  • Reduced memory allocation in test method: 10×100MB → 2×50MB (1GB → 100MB total)
  • Updated comments to reflect non-exhaustive memory load behavior
// Before: allocates ~1GB, triggers OOM
for (int i = 0; i < 10; i++)
{
    var largeArray = new byte[100 * 1024 * 1024];
    // ...
}

// After: allocates ~100MB, stays within limits
for (int i = 0; i < 2; i++)
{
    var largeArray = new byte[50 * 1024 * 1024];
    // ...
}

Preserves error simulation functionality while preventing container memory exhaustion.

Original prompt

This section details on the original issue you should resolve

<issue_title>[INC0010007] Sev3: High Memory Usage & OOM in Octopets API (Container Apps)</issue_title>
<issue_description>Incident: INC0010007 (Sev3)
Service: Octopets API (Container Apps)
Resource: /subscriptions/06dbbc7b-2363-4dd4-9803-95d07f1a8d3e/resourceGroups/rg-octopets-lab/providers/Microsoft.App/containerApps/octopetsapi
Environment: Sweden Central, latest revision octopetsapi--0000012, image acrey2l3bicrkpwu.azurecr.io/octopetsapi:20251216173840, resources: cpu=0.5, memory=1Gi, min/max replicas=1/3, http-scaler concurrentRequests=10
Alert: High Memory Usage - Octopets API, fired 2025-12-17T17:23:28Z, threshold WorkingSetBytes > 500,000,000 avg 5m

Evidence Summary:

  • Metrics (WorkingSetBytes, last 45m): plateau at ~852–855MB from 17:19–17:23 UTC; peak 854,720,512 at 17:21; pre-spike mean ~114MB, sustained-high mean ~853MB.
    Chart-ready points (UTC|bytes):
    16:58:00|103800000;17:15:00|124801024;17:17:00|92792832;17:18:00|472297472;17:19:00|851681280;17:20:00|852680704;17:21:00|854720512;17:22:00|853712896;17:23:00|852989952

  • Container logs (revision octopetsapi--0000012, 17:17–17:19 UTC): repeated System.OutOfMemoryException in Octopets.Backend.Endpoints.ListingEndpoints.AReallyExpensiveOperation() at /src/backend/Endpoints/ListingEndpoints.cs:line 18; triggered via MapListingEndpoints delegate (line 53). Multiple connections failing in Kestrel.
    Excerpt:
    System.OutOfMemoryException
    at Octopets.Backend.Endpoints.ListingEndpoints.AReallyExpensiveOperation() in /src/backend/Endpoints/ListingEndpoints.cs:line 18
    at Octopets.Backend.Endpoints.ListingEndpoints.<>c.<b__1_1>d.MoveNext() in /src/backend/Endpoints/ListingEndpoints.cs:line 53

Suspected Root Cause:

  • AReallyExpensiveOperation performs large in-memory allocations (e.g., materializing full datasets/buffers) causing rapid memory ramp to ~0.85Gi and OOMs under concurrent load (http scaler allows 10 concurrent requests per replica). With container memory limit at 1Gi, this leads to exhaustion and exceptions.

Proposed Code Fix (minimal diff sketch):

  • Avoid materializing large collections; stream results and chunk processing.
  • Replace ToList()/new byte[...] large buffers with IAsyncEnumerable and Response.BodyWriter.
  • Add defensive cap per-request memory (e.g., max buffer size) and cancellation.

Example patch (illustrative):
--- a/src/backend/Endpoints/ListingEndpoints.cs
+++ b/src/backend/Endpoints/ListingEndpoints.cs
@@

  • public static byte[] AReallyExpensiveOperation()
  • {
  • // builds a huge in-memory payload
    
  • var data = BuildHugePayload();
    
  • return data.ToArray();
    
  • }
  • public static async IAsyncEnumerable<ReadOnlyMemory> AReallyExpensiveOperationAsync([EnumeratorCancellation] CancellationToken ct = default)
  • {
  • await foreach (var chunk in StreamPayloadChunksAsync(ct))
    
  • {
    
  •     yield return chunk;
    
  • }
    
  • }
    @@
  • app.MapGet("/listings/export", () => AReallyExpensiveOperation());
  • app.MapGet("/listings/export", async (HttpContext ctx, CancellationToken ct) =>
  • {
  • ctx.Response.ContentType = "application/octet-stream";
    
  • await foreach (var chunk in AReallyExpensiveOperationAsync(ct))
    
  • {
    
  •     await ctx.Response.BodyWriter.WriteAsync(chunk, ct);
    
  • }
    
  • });

IaC/Config Observations & Recommendations:

  • Current container resources: memory=1Gi; sustained usage ~0.85Gi under load. Consider increasing memory to 2Gi temporarily OR (preferred) reduce per-request concurrency and fix code.
  • Scale rule: http concurrentRequests=10; consider lowering to 5 to reduce per-replica memory pressure until code fix lands.
  • Alert threshold uses absolute bytes (500M). Optionally add MemoryPercentage metric alert > 80% for better portability across sizes.

Repro/Diagnostics:

  • Hit /listings/export or endpoints invoking AReallyExpensiveOperation during moderate concurrency; observe memory growth and OOM.
  • Correlate with alert window 17:16–17:23 UTC.

Action Needed:

  • Implement streaming/chunked processing in ListingEndpoints; remove large in-memory buffers.
  • Add caps and guards; review other endpoints for similar patterns.
  • Adjust autoscaler/concurrency and memory limit if needed.

Notes:

  • Semantic source indexing was unavailable due to repository authorization; evidence based on runtime stack traces and container logs. Please confirm exact lines and paths in ListingEndpoints.cs.

Links:


This issue was created by sre-agent-lab--755504d1
Trac...

  • Fixes gderossilive/AzSreAgentLab#1

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI changed the title [WIP] Fix high memory usage in Octopets API Fix OOM in Octopets API by reducing test memory allocation from 1GB to 100MB Dec 17, 2025
Copilot AI requested a review from gderossilive December 17, 2025 17:34
@gderossilive gderossilive deleted the copilot/fix-high-memory-usage branch December 17, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants