Increase coordinator Lambda default memory to improve performance and stability #53

hablutzel1 · 2025-03-31T19:37:40Z

Deploying the project to AWS with the default configuration results in a slow and randomly failing API as you can reproduce with the following simple script:

#!/bin/bash

API_URL="https://zounb7fdwc.execute-api.us-east-2.amazonaws.com/v1/mpic"
API_KEY="xxx"

while true; do
  start_time=$(gdate +%s%3N)
  RESPONSE=$(curl --silent --location "$API_URL" \
    --header 'Content-Type: application/json' \
    --header 'Accept: application/json' \
    --header "x-api-key: $API_KEY" \
    --data "{
            \"check_type\": \"caa\",
            \"domain_or_ip_target\": \"example.org\"
          }"
  )
  end_time=$(gdate +%s%3N)
  duration=$((end_time - start_time))
  echo "Response took ${duration}ms: ${RESPONSE:0:100}..."
done

That produces output like the following:

$ ./reproduce_timeout_problem.sh 
Response took 11671ms: {"mpic_completed":true,"request_orchestratio...
Response took 2151ms: {"mpic_completed":true,"request_orchestration...
Response took 5453ms: {"mpic_completed":true,"request_orchestration...
Response took 28451ms: {"mpic_completed":true,"request_orchestratio...
Response took 26961ms: {"mpic_completed":true,"request_orchestratio...
Response took 29623ms: {"message": "Endpoint request timed out"}...

And it can be observed that the coordinator Lambda is almost always taking 100% memory:

$ aws logs filter-log-events --log-group-name '/aws/lambda/open_mpic_lambda_coordinator_826858333' | grep "Max Memory Used"
            "message": "REPORT RequestId: 510a9d2c-a968-4598-a6d8-befb041dc95f\tDuration: 1700.98 ms\tBilled Duration: 1701 ms\tMemory Size: 128 MB\tMax Memory Used: 124 MB\t\n",
            "message": "REPORT RequestId: f36fde4d-2573-48c0-9726-d352d8455283\tDuration: 1582.41 ms\tBilled Duration: 1583 ms\tMemory Size: 128 MB\tMax Memory Used: 126 MB\t\n",
            "message": "REPORT RequestId: 36fecd8c-dbc3-433b-b2ac-ca1952e9a09b\tDuration: 1569.24 ms\tBilled Duration: 1570 ms\tMemory Size: 128 MB\tMax Memory Used: 128 MB\t\n",
            "message": "REPORT RequestId: 3dd7166c-4387-4b1b-9087-9a788c0de17c\tDuration: 1804.33 ms\tBilled Duration: 1805 ms\tMemory Size: 128 MB\tMax Memory Used: 128 MB\t\n",
            "message": "REPORT RequestId: 99b6cc3d-7d1e-47ca-bcd9-4ef84ecf25eb\tDuration: 7349.00 ms\tBilled Duration: 7350 ms\tMemory Size: 128 MB\tMax Memory Used: 128 MB\t\n",
            "message": "REPORT RequestId: f2665534-7abb-44f6-9a73-e9f02b39c501\tDuration: 63057.97 ms\tBilled Duration: 60000 ms\tMemory Size: 128 MB\tMax Memory Used: 128 MB\t\n",

But if the Lambda memory is doubled to 256 MB, the following memory consumption is now observed and the API is stable:

$ aws logs filter-log-events --log-group-name '/aws/lambda/open_mpic_lambda_coordinator_826858333' | grep "Max Memory Used"
...
            "message": "REPORT RequestId: ef34d358-0d32-4f21-8d04-b94336a49c92\tDuration: 661.62 ms\tBilled Duration: 662 ms\tMemory Size: 256 MB\tMax Memory Used: 218 MB\t\n",
            "message": "REPORT RequestId: bdfee13f-2dbe-4acc-a8e0-c819dd17b13c\tDuration: 674.05 ms\tBilled Duration: 675 ms\tMemory Size: 256 MB\tMax Memory Used: 218 MB\t\n",
            "message": "REPORT RequestId: 88cbaf53-ea55-4b0b-a83e-637927545fe6\tDuration: 685.87 ms\tBilled Duration: 686 ms\tMemory Size: 256 MB\tMax Memory Used: 219 MB\t\n",
            "message": "REPORT RequestId: 8bb729d7-8182-43d9-919e-a705047434c9\tDuration: 723.59 ms\tBilled Duration: 724 ms\tMemory Size: 256 MB\tMax Memory Used: 219 MB\t\n",
            "message": "REPORT RequestId: 4d5f89c4-d569-4eaa-8145-72cbd63ba03d\tDuration: 349.70 ms\tBilled Duration: 350 ms\tMemory Size: 256 MB\tMax Memory Used: 219 MB\t\n",
            "message": "REPORT RequestId: c5d2ce6b-cd47-4e8d-8127-ada9cde7fe4b\tDuration: 241.34 ms\tBilled Duration: 242 ms\tMemory Size: 256 MB\tMax Memory Used: 219 MB\t\n",

Now, ~219 from 256 (~86%) might still be dangerously close to the limit, so you might want to increase the default even further for safety (e.g. to accomodate to the codebase growing).

birgelee · 2025-03-31T20:10:26Z

Thanks for bringing this to our attention. We intend to address this. We appreciate you helping with the AWS tuning. We had also noticed some timeouts particularly occurring on high remote perspective counts.

birgelee · 2025-04-01T00:09:54Z

Thanks for contributing this. I confirmed it passes all integration tests and fixed a bug we previously had with it sometimes failing integration tests.

I did take the liberty of doubling the memory to 512 as I feel 80% memory pressure is not good and this could be exceeded if even more perspectives were added.

Increase coordinator Lambda default memory.

57b44c6

increased memory more.

105936a

birgelee merged commit 8fb72c0 into open-mpic:main Apr 1, 2025
1 check passed

hablutzel1 deleted the coordinator-memory branch April 20, 2025 22:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase coordinator Lambda default memory to improve performance and stability #53

Increase coordinator Lambda default memory to improve performance and stability #53

Uh oh!

hablutzel1 commented Mar 31, 2025

Uh oh!

birgelee commented Mar 31, 2025

Uh oh!

birgelee commented Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Increase coordinator Lambda default memory to improve performance and stability #53

Increase coordinator Lambda default memory to improve performance and stability #53

Uh oh!

Conversation

hablutzel1 commented Mar 31, 2025

Uh oh!

birgelee commented Mar 31, 2025

Uh oh!

birgelee commented Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants