Skip to content

Conversation

@hablutzel1
Copy link
Contributor

This PR improves the cold start time by a around 2 seconds. There are some other options to prevent cold starts, like Provisioned Concurrency, SnapStart or EventBridge Scheduler (as a warmer), but they cannot fully eliminate cold starts. For example, with Provisioned Concurrency, if the number of concurrent requests surpasses the number of provisioned execution environments, cold starts are still going to happen.

So, given that cold starts cannot be totally eliminated, it makes sense to improve cold start first.

With the current implementation, for 5 concurrent requests sent using an script like the following:

#!/bin/bash

API_URL="https://4mw4jvkdr1.execute-api.us-east-2.amazonaws.com/v1/mpic"
API_KEY="xxx"

for i in {1..5}; do
  (
    start_time=$(gdate +%s%3N)
    RESPONSE=$(curl --silent --location "$API_URL" \
        --header 'Content-Type: application/json' \
        --header 'Accept: application/json' \
        --header "x-api-key: $API_KEY" \
        --data "{
          \"check_type\": \"caa\",
          \"domain_or_ip_target\": \"example.org\"
        }")
    end_time=$(gdate +%s%3N)
    duration=$((end_time - start_time))
    echo "Process $i: Response took ${duration}ms: ${RESPONSE:0:100}..."
  ) &
done

wait

The result is:

Process 2: Response took 10217ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 4: Response took 10306ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 3: Response took 10334ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 5: Response took 10336ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 1: Response took 10350ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...

With the proposed changes:

Process 1: Response took 8366ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 3: Response took 8423ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 2: Response took 8460ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 5: Response took 8570ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...
Process 4: Response took 8582ms: {"mpic_completed":true,"request_orchestration_parameters":null,"actual_orchestration_parameters":{"p...

Note that for obtaining the previous cold starts results, it is necessary to wait around 7 minutes without any requests to AWS to get all the current execution environments to be shut down.

On the other hand, for warm starts some improvements have been observed as well.

Now, about the changes, the pool of clients has been reduced to a single client per region as Lambda execution environments only execute one request at the time. From https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html:

During this entire process, this execution environment is busy and cannot process other requests.

Additionally, I've increased the memory for the perspectives to 256 MB as I've observed they are running close to the current limit of 128MB.

On the other hand, I've reduced the default memory to 256 MB for the coordinator as the average usage is currently around 160 MB. Doubling the memory increases the cost without noticeable performance improvements.

Finally, in the future, AWS Lambda Power Tuning (see https://aws.amazon.com/pt/blogs/compute/operating-lambda-performance-optimization-part-2/) could possibly be introduced to automatically measure and supervise the memory usage of the functions as changes are being introduced.

_handler = MpicCaaCheckerLambdaHandler()
return _handler

if os.environ.get("PYTEST_VERSION") is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline before and after these if statements (so there's 2 lines between top level commands) for PEP8 compliance (PyCharm give me style warnings here). In general, in the main.py files, have there be 2 newlines between top level commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now, to prevent these basic errors in the future, a linting tool like pycodestyle could be introduced to check for some specific coding conventions errors like E305 (https://pycodestyle.pycqa.org/en/latest/intro.html#error-codes).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good thought. I'll add it to our to-dos to look at to toss into the pipeline.

_handler = MpicDcvCheckerLambdaHandler()
return _handler

if os.environ.get("PYTEST_VERSION") is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to not use a check that looks for PYTEST_VERSION (set by virtue of being in a unit test run)... I prefer a more explicit env variable to drive this.

Recommend this in each of the main.py files instead:

if os.environ.get("INIT_LAMBDA_HANDLER", "true").lower() == "true":
    get_handler()

Then, to set the variable accordingly in the unit tests, I would leverage the existing conftest.py and add this:

def pytest_configure():
    os.environ["INIT_LAMBDA_HANDLER"] = "false"

at the top, just under the imports and above def setup_logging(). But do preserve 2 spaces between imports and the next command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep things simpler, what do you think of looking for AWS_LAMBDA_FUNCTION_NAME (https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-runtime) to detect code is running in AWS Lambda?

@hablutzel1 hablutzel1 force-pushed the improve-performance branch from be85d95 to 06b4db4 Compare April 26, 2025 01:47
@hablutzel1 hablutzel1 requested a review from sciros April 26, 2025 01:50
@birgelee
Copy link
Member

I spoke with @sciros and the AWS lambda name check is OK since we are in an AWS repo. I am going ahead with merging this in since all comments are addressed.

@birgelee birgelee merged commit e96c396 into open-mpic:main Apr 27, 2025
@hablutzel1 hablutzel1 deleted the improve-performance branch April 29, 2025 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants