Skip to content

Conversation

@mcopik
Copy link
Collaborator

@mcopik mcopik commented Aug 9, 2022

The features that we're planning to implement here are:

  • Building C++ functions with CMake.
  • Building and handling functions' Docker containers for C++.
  • Providing base images with AWS SDK, Boost, and Redis support.
  • Test sleep benchmarks.

Summary by CodeRabbit

  • New Features
    • First-class C++ support: CLI/tooling accepts "cpp"; C++ benchmarks (sleep, thumbnailer, image-recognition, pagerank, bfs) and C++ packaging/build/image flow added.
  • Bug Fixes
    • Safer storage stop and cache handling to avoid missing-key errors and improve caching robustness.
  • Refactor
    • Unified Language/Architecture enums across packaging and deployment.
  • Chores
    • Updated Docker build tooling and dependency images, CI orb, docs, and new runtime/config presets for Python/NodeJS/CPP.

✏️ Tip: You can customize this high-level summary in your review settings.

@mcopik mcopik mentioned this pull request Aug 9, 2022
6 tasks
@mcopik mcopik mentioned this pull request Apr 26, 2023
@coderabbitai
Copy link

coderabbitai bot commented Jun 18, 2025

📝 Walkthrough

Walkthrough

Adds first-class C++ support across benchmarks and tooling: C++ benchmark implementations and wrappers, an AWS C++ Lambda handler, S3/DynamoDB/Redis C++ wrappers, multi-stage Dockerfiles and installer for C++ dependencies, a C++ dependency model, Language/Architecture enums, packaging/build API refactors, and regression/test updates.

Changes

Cohort / File(s) Change Summary
Microbenchmark (sleep)
benchmarks/000.microbenchmarks/010.sleep/config.json, benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp
Enable cpp in config and add a C++ sleep function returning JSON.
AWS C++ runtime & wrappers
benchmarks/wrappers/aws/cpp/handler.cpp, benchmarks/wrappers/aws/cpp/key-value.*, benchmarks/wrappers/aws/cpp/redis.*, benchmarks/wrappers/aws/cpp/storage.*, benchmarks/wrappers/aws/cpp/utils.*
New Lambda runtime handler and storage/redis/dynamodb wrappers with timing, retries/backoff, streaming, cold-start/container metadata, and microsecond timer util.
Multimedia thumbnailer (C++)
benchmarks/200.multimedia/210.thumbnailer/cpp/*
New OpenCV C++ thumbnailer: download, resize, encode, upload, UUID output naming and timing measurements.
Image recognition (C++)
benchmarks/400.inference/411.image-recognition/cpp/*
Added LibTorch/OpenCV-based helpers and recognition scaffolding (model loading, preprocessing, partial handler).
Scientific benchmarks (C++)
benchmarks/500.scientific/501.graph-pagerank/cpp/*, benchmarks/500.scientific/503.graph-bfs/cpp/*
New igraph-based PageRank and BFS C++ implementations returning measurement JSON.
C++ support registration & configs
config/systems.json, multiple benchmarks/*/config.json
Register cpp language, add cpp_dependencies entries and list C++ deployment files.
Dockerfiles & installer
dockerfiles/aws/cpp/Dockerfile.build, dockerfiles/aws/cpp/Dockerfile.function, dockerfiles/aws/cpp/Dockerfile.dependencies-*, dockerfiles/cpp_installer.sh, dockerfiles/aws/nodejs/Dockerfile.build
Add multi-stage Dockerfiles for C++ dependencies and function images, dependency builders, installer script; minor Node.js build deps change.
C++ dependency model
sebs/cpp_dependencies.py
New CppDependencies enum and CppDependencyConfig with helpers to emit CMake fragments and generate Dockerfile copy/image fragments.
Core typing & enums
sebs/types.py
Add Language and Architecture enums with (de)serialization utilities (used across packaging and runtimes).
Packaging & build refactors
sebs/benchmark.py, sebs/aws/aws.py, sebs/faas/container.py, sebs/faas/system.py, sebs/faas/function.py, sebs/azure/azure.py, sebs/gcp/gcp.py, sebs/openwhisk/openwhisk.py, sebs/local/local.py
Replace string-based language handling with Language enum, introduce CMake-based CPP packaging, change package_code signatures/return shapes, thread optional container_uri through create/update flows, and update runtime resolution.
Cache & regression/test harness
sebs/cache.py, sebs/regression.py
Cache now distinguishes code_location vs container URI; regression adds C++ benchmark list, AWSTestSequenceCpp and per-test metadata (architecture, deployment_type, benchmark).
Tools & image builder
tools/build_docker_images.py
Add dependency-image build support, CLI args (--type dependencies, --type-tag, --parallel), add cpp language option, and dependency build loops.
New experiment configs
config/cpp.json, config/nodejs.json, config/python.json
Add C++ and updated Node/Python experiment definitions and deployment configs.
Docs & CI
docs/storage.md, .circleci/config.yml, README.md, docs/build.md, docs/benchmarks.md
Update storage docs, bump CircleCI python orb and executor tag, require Python 3.10+, and expand build docs with C++ packaging/image flows.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as Developer CLI
    participant SeBS as SeBS packager
    participant Builder as Docker builder
    participant Registry as Image registry
    participant Cloud as Cloud API
    participant Lambda as Managed runtime
    participant Handler as C++ handler
    participant Storage as S3/DynamoDB/Redis

    CLI->>SeBS: package/build --language cpp
    SeBS->>SeBS: generate CMakeLists.txt & deployment package
    SeBS->>Builder: build dependency images & function image
    Builder->>Registry: push images (optional)
    SeBS->>Cloud: create_function(runtime="provided.al2", package/container)
    CLI->>Lambda: invoke function
    Lambda->>Handler: start (cold-start metadata)
    Handler->>Handler: parse payload → call user C++ function
    Handler->>Storage: S3/DynamoDB/Redis operations
    Handler-->>Lambda: return JSON result with measurements
    Lambda->>CLI: response
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

"I nibble CMake leaves at dawn,
Dockers hum while builds are drawn,
Enums hop in, binaries compile,
Benchmarks tumble, tests run in style,
The rabbit hops — build done with a grin." 🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.90% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add support for C++ benchmarks' directly and clearly describes the main objective of this PR, which is to introduce C++ as a new supported language for the serverless benchmarks framework.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

🔭 Outside diff range comments (4)
benchmarks/wrappers/aws/cpp/redis.hpp (1)

1-27: Add include guards and improve class design.

The header file lacks include guards and has several design issues that should be addressed:

  1. Missing include guards: The header should have include guards to prevent multiple inclusions
  2. Constructor should be explicit: Single-parameter constructors should be explicit to prevent implicit conversions
  3. Missing const correctness: Methods that don't modify state should be const
  4. Raw pointer management: Consider RAII principles for redisContext*
+#ifndef REDIS_HPP
+#define REDIS_HPP
+
 #include <cstdint>
 #include <string>

 #include <aws/core/utils/memory/stl/AWSString.h>

 #include <hiredis/hiredis.h>

 class Redis
 {
   redisContext* _context;
 public:

-  Redis(std::string redis_hostname, int redis_port);
+  explicit Redis(const std::string& redis_hostname, int redis_port);
   ~Redis();

-  bool is_initialized();
+  bool is_initialized() const;

-  uint64_t download_file(Aws::String const &key, int &required_retries, bool with_backoff);
+  uint64_t download_file(const Aws::String& key, int& required_retries, bool with_backoff) const;

-  uint64_t upload_file(Aws::String const &key, int size, char* pBuf);
+  uint64_t upload_file(const Aws::String& key, int size, const char* pBuf);

-  uint64_t delete_file(std::string const &key);
+  uint64_t delete_file(const std::string& key) const;

 };

+#endif // REDIS_HPP
benchmarks/wrappers/aws/cpp/key-value.hpp (1)

1-32: Add include guards and improve interface design.

The header file has several areas for improvement:

  1. Missing include guards: Add them to prevent multiple inclusions
  2. Const correctness: Methods that don't modify state should be const
  3. Parameter const-correctness: Input parameters should be const references
  4. Comment about non-copyable/non-movable: Should be enforced with deleted copy/move operators
+#ifndef KEY_VALUE_HPP
+#define KEY_VALUE_HPP
+
 #include <cstdint>
 #include <string>
 #include <initializer_list>
 #include <memory>

 #include <aws/core/utils/memory/stl/AWSString.h>
 #include <aws/dynamodb/DynamoDBClient.h>

 class KeyValue
 {
-  // non-copyable, non-movable
   std::shared_ptr<Aws::DynamoDB::DynamoDBClient> _client;
 public:

   KeyValue();
+  
+  // non-copyable, non-movable
+  KeyValue(const KeyValue&) = delete;
+  KeyValue& operator=(const KeyValue&) = delete;
+  KeyValue(KeyValue&&) = delete;
+  KeyValue& operator=(KeyValue&&) = delete;

-uint64_t download_file(Aws::String const &bucket,
-                        Aws::String const &key,
+uint64_t download_file(const Aws::String& bucket,
+                        const Aws::String& key,
                         int& required_retries,
                         double& read_units,
-                        bool with_backoff = false);
+                        bool with_backoff = false) const;

-uint64_t upload_file(Aws::String const &bucket,
-                        Aws::String const &key,
+uint64_t upload_file(const Aws::String& bucket,
+                        const Aws::String& key,
                           double& write_units,
                           int size,
-                          unsigned char* pBuf);
+                          const unsigned char* pBuf);

 };

+#endif // KEY_VALUE_HPP
benchmarks/wrappers/aws/cpp/storage.hpp (1)

1-29: Add include guards and improve interface consistency.

The header file needs several improvements:

  1. Missing include guards: Add them for safety
  2. Interface consistency: Consider aligning parameter patterns with other wrapper classes
  3. Const correctness: Download method should be const
+#ifndef STORAGE_HPP
+#define STORAGE_HPP
+
 #include <cstdint>

 #include <aws/core/utils/memory/stl/AWSString.h>
 #include <aws/s3/S3Client.h>

 class Storage
 {
   Aws::S3::S3Client _client;
 public:

-  Storage(Aws::S3::S3Client && client):
+  explicit Storage(Aws::S3::S3Client&& client):
     _client(client)
   {}

   static Storage get_client();

-  uint64_t download_file(Aws::String const &bucket,
-                          Aws::String const &key,
+  uint64_t download_file(const Aws::String& bucket,
+                          const Aws::String& key,
                           int &required_retries,
-                          bool report_dl_time);
+                          bool report_dl_time) const;

-  uint64_t upload_random_file(Aws::String const &bucket,
-                          Aws::String const &key,
-                          int size, char* pBuf);
+  uint64_t upload_random_file(const Aws::String& bucket,
+                          const Aws::String& key,
+                          int size, const char* pBuf);

 };

+#endif // STORAGE_HPP
tools/build_docker_images.py (1)

51-151: CRITICAL: Resolve Git merge conflicts before merging.

The file contains unresolved Git merge conflict markers (<<<<<<<, =======, >>>>>>>) that cause syntax errors and prevent the code from running. These conflicts must be resolved manually.

The conflicts appear to be between:

  • HEAD version with dot notation: f"{args.type}.{args.type_tag}"
  • Branch a9f3c27 with hyphen notation: f"{args.type}-{args.type_tag}"

Please resolve these conflicts by choosing the appropriate format and removing all conflict markers.

🧹 Nitpick comments (18)
dockerfiles/aws/nodejs/Dockerfile.build (1)

5-5: Optimize yum layer by cleaning cache.

To slim down the final image, combine the install and cleanup steps into a single RUN and clear the yum cache:

-RUN yum install -y cmake curl libcurl libcurl-devel
+RUN yum install -y cmake curl libcurl libcurl-devel && \
+    yum clean all && \
+    rm -rf /var/cache/yum
dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis (1)

7-8: Consider adding error handling and cleanup.

The build process could benefit from explicit error handling and cleanup to ensure robust builds and smaller image sizes.

-RUN   git clone https://github.com/redis/hiredis.git && cd hiredis\
-      && PREFIX=/opt make -j${WORKERS} install
+RUN set -e && \
+    git clone --depth 1 --branch v1.2.0 https://github.com/redis/hiredis.git && \
+    cd hiredis && \
+    PREFIX=/opt make -j${WORKERS} install && \
+    cd .. && \
+    rm -rf hiredis
benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp (2)

7-7: Remove unused include.

The iostream header is included but never used in this file.

 #include <thread>
-#include <iostream>

16-17: Remove commented-out code.

Commented-out code should be removed to keep the codebase clean. If this alternative implementation is needed for reference, consider documenting it properly or moving it to documentation.

-  //std::string res_json = "{ \"result\": " + std::to_string(sleep) + "}";
-  //return aws::lambda_runtime::invocation_response::success(res_json, "application/json");
dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1)

8-10: Consider improving build robustness and cleanup.

The build process could benefit from better error handling and cleanup to reduce image size and improve reliability.

-RUN   git clone https://github.com/awslabs/aws-lambda-cpp.git\
-      && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/opt -B aws-lambda-cpp/build -S aws-lambda-cpp\
-      && cmake3 --build aws-lambda-cpp/build --parallel ${WORKERS} --target install
+RUN set -e && \
+    git clone --depth 1 --branch v0.2.8 https://github.com/awslabs/aws-lambda-cpp.git && \
+    cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/opt \
+           -B aws-lambda-cpp/build -S aws-lambda-cpp && \
+    cmake3 --build aws-lambda-cpp/build --parallel ${WORKERS} --target install && \
+    rm -rf aws-lambda-cpp
dockerfiles/aws/cpp/Dockerfile.dependencies-sdk (1)

9-9: Consider adding integrity verification and optimizing cmake configuration.

The cmake configuration looks appropriate, but consider adding integrity checks and optimization flags.

       && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;dynamodb;sqs" -DENABLE_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/opt/ ..\
+      && cmake3 -DCMAKE_BUILD_TYPE=Release -DBUILD_ONLY="s3;dynamodb;sqs" -DENABLE_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/opt/ -DCPP_STANDARD=17 -DCMAKE_CXX_FLAGS="-O3" ..\
dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1)

13-15: Remove commented code.

The commented lines should be removed as they add confusion and don't serve any purpose in the final implementation.

-#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
-#      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
-#      && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
config/systems.json (1)

125-137: Consider explicit architecture specification for consistency.

The C++ configuration uses "all" for architecture while Python and Node.js specify explicit architectures ("x64", "arm64"). This inconsistency might cause confusion and could limit architecture-specific optimizations.

Consider updating to be consistent with other languages:

       "cpp": {
         "base_images": {
-          "all": "amazon/aws-lambda-provided:al2.2022.04.27.09"
+          "x64": "amazon/aws-lambda-provided:al2.2022.04.27.09",
+          "arm64": "amazon/aws-lambda-provided:al2.2022.04.27.09"
         },
         "dependencies": [
             "runtime", "sdk", "boost", "hiredis"
         ],
-        "versions": ["all"],
+        "versions": ["all"],
         "images": ["build"],
dockerfiles/aws/cpp/Dockerfile.build (1)

18-21: Optimize script copying and permission setting.

Consider combining COPY and chmod operations to reduce layers.

-COPY docker/entrypoint.sh /sebs/entrypoint.sh
-COPY docker/cpp_installer.sh /sebs/installer.sh
-RUN chmod +x /sebs/entrypoint.sh
-RUN chmod +x /sebs/installer.sh
+COPY --chmod=755 docker/entrypoint.sh /sebs/entrypoint.sh
+COPY --chmod=755 docker/cpp_installer.sh /sebs/installer.sh
benchmarks/wrappers/aws/cpp/redis.hpp (1)

11-11: Consider using smart pointers for automatic resource management.

The raw redisContext* pointer requires manual memory management. Consider using std::unique_ptr<redisContext, void(*)(redisContext*)> with a custom deleter for automatic cleanup.

+#include <memory>
+
 class Redis
 {
-  redisContext* _context;
+  std::unique_ptr<redisContext, void(*)(redisContext*)> _context;
 public:
benchmarks/wrappers/aws/cpp/storage.hpp (1)

18-26: Consider interface consistency across wrapper classes.

The method signatures differ from other wrapper classes (Redis, KeyValue). Consider standardizing parameter patterns for consistency, especially for retry handling and timing measurements.

For example, the Redis class uses int& required_retries and bool with_backoff, while this class uses bool report_dl_time. Consider aligning these interfaces for consistency across the AWS wrapper ecosystem.

benchmarks/wrappers/aws/cpp/key-value.cpp (1)

19-20: Consider making configuration more flexible.

The hardcoded CA file path and commented region configuration reduce flexibility across different environments.

Consider reading these from environment variables:

-  //config.region = "eu-central-1";
-  config.caFile = "/etc/pki/tls/certs/ca-bundle.crt";
+  const char* region = std::getenv("AWS_REGION");
+  if (region) {
+    config.region = region;
+  }
+  const char* caFile = std::getenv("AWS_CA_BUNDLE");
+  config.caFile = caFile ? caFile : "/etc/pki/tls/certs/ca-bundle.crt";
benchmarks/wrappers/aws/cpp/handler.cpp (1)

23-23: Fix typo in comment.

"Gateaway" should be "Gateway".

-  // HTTP trigger with API Gateaway sends payload as a serialized JSON
+  // HTTP trigger with API Gateway sends payload as a serialized JSON
benchmarks/wrappers/aws/cpp/storage.cpp (3)

46-50: Clarify the NOP operation purpose.

The current implementation to prevent compiler optimization is unusual. Consider a simpler approach or document why this specific pattern is needed.

             // Perform NOP on result to prevent optimizations
             std::stringstream ss;
             ss << s.rdbuf();
-            std::string first(" ");
-            ss.get(&first[0], 1);
+            // Simply access the stream to prevent optimization
+            volatile auto size = ss.tellp();
+            (void)size;  // Suppress unused variable warning

59-63: Remove commented code or document why it's preserved.

The commented backoff logic should either be removed or documented if it's intentionally kept for future use.

         } else {
             retries += 1;
-            //int sleep_time = retries;
-            //if (retries > 100) {
-            //    sleep_time = retries * 2;
-            //}
-            //std::this_thread::sleep_for(std::chrono::milliseconds(sleep_time));
+            // TODO: Consider adding backoff delay for failed downloads
         }

75-78: Simplify the comment about bufferstream usage.

The comment is quite long and could be more concise while still explaining the rationale.

-    /**
-     * We use Boost's bufferstream to wrap the array as an IOStream. Usign a light-weight streambuf wrapper, as many solutions 
-     * (e.g. https://stackoverflow.com/questions/13059091/creating-an-input-stream-from-constant-memory) on the internet
-     * suggest does not work because the S3 SDK relies on proper functioning tellp(), etc... (for instance to get the body length).
-     */
+    // Use Boost's bufferstream to wrap the buffer as an IOStream.
+    // The S3 SDK requires proper tellp() functionality, so simple streambuf wrappers don't work.
sebs/aws/aws.py (1)

671-677: Simplify the conditional structure.

The elif after return is unnecessary and can be simplified for better readability.

 def cloud_runtime(self, language: Language, language_version: str):
     if language in [Language.NODEJS, Language.PYTHON]:
         return ("{}{}".format(language, language_version),)
-    elif language == Language.CPP:
+    if language == Language.CPP:
         return "provided.al2"
-    else:
-        raise NotImplementedError()
+    raise NotImplementedError()
sebs/benchmark.py (1)

18-19: Clean up duplicate import.

Remove the duplicate Language import on line 19.

 from sebs.utils import find_benchmark, project_absolute_path, LoggingBase
-from sebs.types import BenchmarkModule, Language
 from sebs.types import Language
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d68834 and 9a96d72.

📒 Files selected for processing (30)
  • benchmarks/000.microbenchmarks/010.sleep/config.json (1 hunks)
  • benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/handler.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/key-value.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/key-value.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/redis.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/redis.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/storage.hpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.cpp (1 hunks)
  • benchmarks/wrappers/aws/cpp/utils.hpp (1 hunks)
  • config/systems.json (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.build (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-hiredis (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-sdk (1 hunks)
  • dockerfiles/aws/nodejs/Dockerfile.build (1 hunks)
  • dockerfiles/cpp_installer.sh (1 hunks)
  • sebs.py (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/azure/azure.py (4 hunks)
  • sebs/benchmark.py (11 hunks)
  • sebs/faas/container.py (9 hunks)
  • sebs/faas/function.py (9 hunks)
  • sebs/faas/system.py (2 hunks)
  • sebs/gcp/gcp.py (4 hunks)
  • sebs/openwhisk/openwhisk.py (3 hunks)
  • sebs/types.py (2 hunks)
  • tools/build_docker_images.py (5 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
sebs/faas/container.py

[refactor] 143-143: Too many arguments (7/5)

(R0913)


[refactor] 143-143: Too many positional arguments (7/5)

(R0917)


[refactor] 143-143: Too many local variables (19/15)

(R0914)

tools/build_docker_images.py

[error] 51-51: Parsing failed: 'invalid syntax (build_docker_images, line 51)'

(E0001)

sebs/aws/aws.py

[refactor] 672-677: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

🪛 Ruff (0.11.9)
sebs/faas/function.py

251-253: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

tools/build_docker_images.py

51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


51-51: SyntaxError: Expected a statement


52-52: SyntaxError: Unexpected indentation


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-53: SyntaxError: Expected a statement


53-54: SyntaxError: Expected a statement


54-54: SyntaxError: Unexpected indentation


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected a statement


59-59: SyntaxError: Expected ',', found name


59-59: SyntaxError: Expected ',', found name


59-59: SyntaxError: Expected ',', found name


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


109-109: SyntaxError: Expected a statement


111-111: SyntaxError: Unexpected indentation


136-136: SyntaxError: unindent does not match any outer indentation level


136-136: SyntaxError: Expected a statement


136-137: SyntaxError: Expected a statement


137-137: SyntaxError: Unexpected indentation


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-138: SyntaxError: Expected a statement


138-139: SyntaxError: Expected a statement


140-140: SyntaxError: Unexpected indentation


149-149: SyntaxError: unindent does not match any outer indentation level


149-149: SyntaxError: Expected a statement


149-150: SyntaxError: Expected a statement


150-150: SyntaxError: Unexpected indentation


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected a statement


151-151: SyntaxError: Expected ',', found name


151-151: SyntaxError: Expected ',', found name


151-151: SyntaxError: Expected ',', found name


152-152: SyntaxError: Unexpected indentation


152-152: SyntaxError: Expected a statement


152-152: SyntaxError: Expected a statement


152-153: SyntaxError: Expected a statement


153-153: SyntaxError: Unexpected indentation

sebs/benchmark.py

19-19: Redefinition of unused Language from line 18

Remove definition: Language

(F811)

🔇 Additional comments (33)
dockerfiles/aws/nodejs/Dockerfile.build (1)

4-5: Update comment and review package dependencies for user management.

The # useradd, groupmod comment no longer matches the installed packages—you're now installing C++ build tools (cmake, curl, libcurl, libcurl-devel) but not shadow-utils (which provides useradd/groupmod). If you still rely on gosu to switch users, either re-add shadow-utils to the install line or remove/update the comment to reflect the new purpose.

sebs/types.py (1)

45-57: Well-structured Architecture enum implementation.

The Architecture enum with its serialize/deserialize methods follows good patterns and provides type safety for the codebase.

benchmarks/000.microbenchmarks/010.sleep/config.json (1)

4-4: LGTM - Consistent language addition.

The addition of "cpp" to the languages array is consistent with the existing configuration pattern and enables C++ benchmark support.

sebs.py (1)

67-67: LGTM - Correct CLI integration.

The addition of "cpp" to the language choices properly integrates C++ support into the command-line interface.

sebs/faas/system.py (1)

16-16: LGTM! Excellent type safety improvement.

The addition of the Language enum import and updating the method signature to use strongly-typed Language instead of string-based language_name significantly improves type safety and code maintainability. This change aligns well with the broader refactoring across the codebase.

Also applies to: 174-174

dockerfiles/aws/cpp/Dockerfile.dependencies-runtime (1)

9-9: Approve the CMake configuration choices.

The CMake configuration looks well-thought-out:

  • Release build type for optimized binaries
  • BUILD_SHARED_LIBS=OFF for static linking (appropriate for Lambda)
  • Custom install prefix to /opt for multi-stage builds
  • Parallel builds using ${WORKERS} for build efficiency
config/systems.json (1)

129-131: Verify dependency order and completeness.

The dependencies list looks comprehensive, but ensure the order matches any build dependencies (e.g., runtime before SDK).

#!/bin/bash
# Verify that all dependency Dockerfiles exist for the listed dependencies
echo "Checking for dependency Dockerfiles..."
for dep in runtime sdk boost hiredis; do
    dockerfile="dockerfiles/aws/cpp/Dockerfile.dependencies-${dep}"
    if [ -f "$dockerfile" ]; then
        echo "✓ Found: $dockerfile"
    else
        echo "✗ Missing: $dockerfile"
    fi
done
sebs/gcp/gcp.py (3)

27-27: Good refactoring for type safety.

The import of the Language enum improves type safety and consistency across the codebase.


142-147: Type-safe dictionary access implementation looks correct.

The refactoring properly updates both CONFIG_FILES and HANDLER dictionaries to use Language enum keys, and all access patterns use the typed language parameter.


128-148: I didn’t locate the Python Language enum – let’s search for its class definition and inspect it for a CPP member:

#!/bin/bash
# Locate and display the Python enum for Language
echo "Searching for Language enum in .py files..."
rg -n "class Language" -g "*.py"

echo -e "\nShowing definition and first 20 lines after declaration:"
file=$(rg -l "class Language" -g "*.py" | head -n1)
if [[ -n "$file" ]]; then
  start_line=$(grep -n "class Language" "$file" | head -n1 | cut -d: -f1)
  sed -n "${start_line},$((start_line+20))p" "$file"
else
  echo "❌ Language enum definition file not found."
fi
dockerfiles/aws/cpp/Dockerfile.build (2)

23-26: Dependency aggregation looks correct.

The sequential copying from all dependency stages properly aggregates the built libraries and headers into /opt.


30-31: ```shell
#!/bin/bash
echo "Locating Dockerfile.build files..."
fd --type f 'Dockerfile.build'

echo
echo "Printing lines 1–60 of dockerfiles/aws/cpp/Dockerfile.build:"
sed -n '1,60p' dockerfiles/aws/cpp/Dockerfile.build

echo
echo "Looking for any COPY commands or /sebs references:"
grep -R -n 'COPY|sebs' dockerfiles/aws/cpp/Dockerfile.build


</details>
<details>
<summary>sebs/openwhisk/openwhisk.py (3)</summary>

`21-21`: **Good addition of type safety with Language enum import.**

The import of `Language` enum enhances type safety and consistency across the codebase.

---

`100-100`: **Parameter type change improves type safety.**

Changing from `language_name: str` to `language: Language` provides compile-time type checking and prevents invalid language strings.

---

`111-111`: **Correct usage of language.value for backward compatibility.**

Using `language.value` maintains compatibility with downstream methods expecting string identifiers.

</details>
<details>
<summary>sebs/azure/azure.py (3)</summary>

`27-27`: **Good addition of Language enum import for type safety.**

The import enhances type consistency across cloud provider modules.

---

`122-122`: **Parameter type change improves type safety.**

The change from string to `Language` enum parameter provides better type checking.

---

`155-155`: **Dictionary lookup now uses proper enum type.**

Good use of the Language enum for dictionary access instead of string.

</details>
<details>
<summary>benchmarks/wrappers/aws/cpp/key-value.hpp (1)</summary>

`13-13`: **Good use of shared_ptr for AWS client management.**

Using `shared_ptr` for the DynamoDB client is appropriate as it handles the AWS SDK object lifecycle correctly.

</details>
<details>
<summary>benchmarks/wrappers/aws/cpp/storage.hpp (2)</summary>

`12-14`: **Good use of move semantics in constructor.**

The constructor correctly uses move semantics to efficiently transfer ownership of the S3Client, avoiding unnecessary copies.

---

`16-16`: **Static factory method follows good design patterns.**

The `get_client()` static factory method is a good design choice for encapsulating client creation logic.

</details>
<details>
<summary>sebs/faas/container.py (3)</summary>

`12-12`: **LGTM: Type-safe language handling.**

The import of the `Language` enum improves type safety and consistency across the codebase.

---

`143-151`: **LGTM: Method signature update for type safety.**

The change from `language_name: str` to `language: Language` parameter improves type safety and aligns with the broader refactoring to use strongly-typed language identifiers.

---

`164-164`: **LGTM: Consistent use of Language enum.**

The usage of `language.value` to extract the string representation is correct and consistent throughout the method.




Also applies to: 188-188, 201-201

</details>
<details>
<summary>sebs/aws/aws.py (3)</summary>

`25-25`: **LGTM: Import addition aligns with type safety improvements.**

The import of the `Language` enum from `sebs.types` supports the refactoring to use strongly-typed language identifiers throughout the codebase.

---

`121-121`: **LGTM: Parameter type improved with enum.**

Changing the parameter type from string to `Language` enum enhances type safety and prevents invalid language values.

---

`139-175`: ```bash
#!/bin/bash
set -eo pipefail

# 1. List all files under benchmarks/ to inspect their extensions
find benchmarks/ -type f

# 2. Search for any 'zip' commands or references to 'benchmark.zip' in those files
grep -R --include="*.sh" --include="Makefile" --include="*.py" -nE "zip|benchmark\.zip" benchmarks/ || true

# 3. Search for any build directory creations or references
grep -R -nE "mkdir.*build|cmake|make" benchmarks/ || true
sebs/faas/function.py (3)

12-12: LGTM: Centralized enum imports improve consistency.

Importing Language and Architecture from the shared sebs.types module enhances modularity and ensures consistent enum usage across the codebase.


221-227: LGTM: Enhanced JSON parsing for C++ triggers.

The enhanced parsing logic correctly handles AWS C++ trigger responses that may have nested JSON bodies, improving compatibility with the new C++ runtime.


290-292: LGTM: Consistent use of enum deserialization.

The update to use Language.deserialize() maintains consistency with the typed enum approach throughout the codebase.

sebs/benchmark.py (3)

7-7: LGTM: Appropriate import for CMake script formatting.

The textwrap import is correctly added to support the dedenting of the CMake script in the new C++ packaging method.


410-450: Well-structured C++ packaging method with future improvement opportunities.

The CMake script generation is well-implemented and correctly handles C++ Lambda runtime dependencies. The FIXME comments appropriately identify areas for future enhancement.

Consider addressing the FIXME comments in future iterations:

  • Generic configuration for different cloud providers
  • Optional dependency configuration
  • Dynamic dependency resolution

629-629: LGTM: Enhanced error logging for debugging.

Adding stderr decoding to the Docker build error logging provides better diagnostic information when builds fail.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
sebs/aws/aws.py (1)

676-682: Simplify the conditional structure.

The elif after return is unnecessary and can be simplified for better readability.

 def cloud_runtime(self, language: Language, language_version: str):
     if language in [Language.NODEJS, Language.PYTHON]:
         return ("{}{}".format(language, language_version),)
-    elif language == Language.CPP:
+    if language == Language.CPP:
         return "provided.al2"
-    else:
-        raise NotImplementedError()
+    raise NotImplementedError()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a96d72 and b5fbb12.

📒 Files selected for processing (5)
  • sebs/aws/aws.py (6 hunks)
  • sebs/benchmark.py (16 hunks)
  • sebs/faas/container.py (6 hunks)
  • sebs/faas/function.py (3 hunks)
  • sebs/openwhisk/openwhisk.py (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • sebs/openwhisk/openwhisk.py
  • sebs/faas/function.py
🧰 Additional context used
🪛 Pylint (3.3.7)
sebs/aws/aws.py

[refactor] 677-682: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

sebs/faas/container.py

[refactor] 137-137: Too many arguments (7/5)

(R0913)


[refactor] 137-137: Too many positional arguments (7/5)

(R0917)


[refactor] 137-137: Too many local variables (19/15)

(R0914)

🔇 Additional comments (15)
sebs/faas/container.py (3)

12-12: LGTM: Good type safety improvement.

The addition of the Language enum import supports the transition from string-based language handling to a more type-safe approach.


102-106: LGTM: Improved code formatting.

The parameter formatting improves readability without changing functionality.


140-140: LGTM: Proper enum usage throughout the method.

The method correctly uses Language enum type for the parameter and language.value when string representation is needed for Docker operations and file paths.

Also applies to: 158-158, 181-181, 193-193

sebs/aws/aws.py (3)

25-25: LGTM: Type safety improvement.

Adding the Language enum import supports better type safety throughout the codebase.


121-121: LGTM: Proper enum adoption.

The method signature correctly uses the Language enum, and the CONFIG_FILES dictionary properly maps enum values to their respective configuration files.

Also applies to: 144-146


148-180: LGTM: Well-structured C++ packaging logic.

The conditional logic properly handles different languages. The C++ implementation assumes the build script creates a zip file in the build directory, which aligns with typical CMake build patterns for AWS Lambda.

sebs/benchmark.py (9)

7-7: LGTM: Required imports for C++ support.

The textwrap import is needed for the CMakeLists.txt generation, and the Language enum import supports the type safety improvements.

Also applies to: 18-18


30-30: LGTM: Proper type annotation.

The type annotation correctly uses the Language enum for better type safety.


180-180: LGTM: Consistent enum usage in hash operations.

The hash methods now properly use the Language enum, maintaining consistency with the broader refactoring.

Also applies to: 253-253


257-282: LGTM: Comprehensive C++ file support.

The file handling dictionaries properly include C++ file types (*.cpp, *.hpp, dependencies.json) and integrate them consistently with existing Python and Node.js support.


327-334: LGTM: Updated copy logic for C++ files.

The copy_code method correctly handles C++ files using the Language enum.


413-454: Well-implemented C++ CMakeLists.txt generation.

The CMakeLists.txt generation includes all necessary dependencies for AWS Lambda C++ runtime:

  • AWS Lambda runtime
  • Boost libraries
  • AWS SDK (S3, DynamoDB)
  • Hiredis for Redis support

The FIXME comments appropriately mark areas for future enhancement (generic deployment support, optional dependencies).


541-546: LGTM: C++ package file configuration.

Correctly identifies CMakeLists.txt as the package file for C++ builds, consistent with CMake build systems.


632-632: LGTM: Improved error logging.

Adding stderr output to error logging provides better debugging information when Docker builds fail.


643-643: LGTM: Updated method signature for type safety.

The build method now properly uses the Language enum in its callable parameter type, maintaining consistency with the broader refactoring.

Also applies to: 684-684

mcopik and others added 7 commits December 6, 2025 17:17
This is a major change - previously, we incorrectly built a code package together with a container.
Now, we no longer do that. Furthermore, we extended cache support to properly separate both types of deployments.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1)

7-12: Fix compiler mismatch and add checksum verification for the downloaded archive.

There are two critical issues:

  1. Compiler mismatch (line 10): The yum install on line 7 installs gcc-c++, but the code attempts to use gcc10-c++, which is not installed. The which gcc10-c++ command will fail and break the build. Either install gcc10-c++ explicitly via yum (if available), or use the installed gcc or g++ as a fallback.

  2. Missing integrity verification (line 8): The downloaded Boost archive lacks checksum verification, creating a supply-chain security risk. Verify the archive against a known SHA256 checksum before extraction.

Apply this diff to fix both issues:

-RUN curl -LO https://archives.boost.io/release/1.79.0/source/boost_1_79_0.tar.gz\
-      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0\
-      && echo "using gcc : : $(which gcc10-c++) ;"  >> tools/build/src/user-config.jam\
+RUN curl -LO https://archives.boost.io/release/1.79.0/source/boost_1_79_0.tar.gz\
+      && echo "d6eef4b4cacb2183f2bf265a5a03a354c3cebf82140bfe5b4ae831dcf72e4e8 boost_1_79_0.tar.gz" | sha256sum -c -\
+      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0\
+      && GCC_CMD=$(which g++) && echo "using gcc : : ${GCC_CMD} ;"  >> tools/build/src/user-config.jam\

This change:

  • Adds SHA256 checksum verification before extraction.
  • Replaces the hardcoded gcc10-c++ with a flexible lookup of the available g++ compiler.
🧹 Nitpick comments (2)
dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1)

3-4: Set a default value for the WORKERS variable.

Line 4 sets WORKERS from the build argument, but without a default value it may be unset at runtime, potentially causing issues with the build parallelism in line 12. Provide a sensible default (e.g., 4 or based on CPU count).

 ARG WORKERS
-ENV WORKERS=${WORKERS}
+ENV WORKERS=${WORKERS:-4}
sebs/types.py (1)

48-54: Consider adding __str__ method for consistency with Language enum.

Language defines __str__ returning self.value, but Architecture only has serialize(). If both enums are used similarly (e.g., in string formatting or logging), adding __str__ would provide consistent behavior.

     def serialize(self) -> str:
         return self.value
+
+    def __str__(self) -> str:
+        return self.value
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9fb0dd3 and c6d5e0e.

📒 Files selected for processing (2)
  • dockerfiles/aws/cpp/Dockerfile.dependencies-boost (1 hunks)
  • sebs/types.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
🧬 Code graph analysis (1)
sebs/types.py (2)
sebs/benchmark.py (2)
  • deserialize (68-79)
  • serialize (291-292)
sebs/faas/function.py (13)
  • deserialize (37-40)
  • deserialize (53-56)
  • deserialize (71-74)
  • deserialize (113-116)
  • deserialize (159-167)
  • deserialize (267-268)
  • deserialize (281-282)
  • deserialize (312-316)
  • deserialize (395-396)
  • serialize (262-263)
  • serialize (277-278)
  • serialize (318-319)
  • serialize (382-391)
🪛 Ruff (0.14.7)
sebs/types.py

35-35: Create your own exception

(TRY002)


35-35: Avoid specifying long messages outside the exception class

(TRY003)


60-60: Create your own exception

(TRY002)


60-60: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
sebs/types.py (1)

25-38: LGTM!

The Language enum is well-implemented with proper deserialization logic and string representation. The deserialize method correctly uses val in the exception message for debugging purposes.

Comment on lines +13 to +15
#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
# && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
# && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove or document the commented-out alternative build path.

Lines 13–15 contain commented-out code that uses an alternative Boost download URL from boostorg.jfrog.io. Clarify the intent: either remove this dead code or add an explanatory comment documenting why the alternative approach is retained (e.g., for future fallback or testing).

-#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
-#      && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
-#      && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#RUN curl -LO https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz\
# && tar -xf boost_1_79_0.tar.gz && cd boost_1_79_0 && ./bootstrap.sh --prefix=/opt\
# && ./b2 -j${WORKERS} --prefix=/opt cxxflags="-fPIC" link=static install
🤖 Prompt for AI Agents
In dockerfiles/aws/cpp/Dockerfile.dependencies-boost around lines 13 to 15,
there is a commented-out alternative Boost build block using boostorg.jfrog.io;
either delete these three commented lines to remove dead code or replace them
with a brief explanatory comment stating why the alternate URL/build is
preserved (e.g., fallback for network issues, historical reference, or testing),
including the Boost version and rationale for keeping it so future maintainers
understand intent.

Comment on lines +55 to +60
@staticmethod
def deserialize(val: str) -> Architecture:
for member in Architecture:
if member.value == val:
return member
raise Exception(f"Unknown architecture type {member}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix variable name in exception message.

The exception references member instead of val. After the loop completes, member will be Architecture.ARM (the last member), so the error message will incorrectly report "Unknown architecture type arm64" instead of the actual unknown input value.

     @staticmethod
     def deserialize(val: str) -> Architecture:
         for member in Architecture:
             if member.value == val:
                 return member
-        raise Exception(f"Unknown architecture type {member}")
+        raise Exception(f"Unknown architecture type {val}")
🧰 Tools
🪛 Ruff (0.14.7)

60-60: Create your own exception

(TRY002)


60-60: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In sebs/types.py around lines 55 to 60, the deserialize method raises an
Exception that interpolates the loop variable `member` instead of the input
`val`, causing incorrect error text; update the raise to reference `val` (e.g.,
raise Exception(f"Unknown architecture type {val}")) so the actual unknown input
is reported (optionally consider using ValueError for a more appropriate
exception type).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
sebs/aws/aws.py (3)

175-185: Critical: cloud_runtime still missing .value for enum string conversion.

Despite past review comments claiming this was fixed, lines 179 and 183 still don't use .value to extract the string from the Language enum. The f-string f"{language}{language_version}" will render as "Language.PYTHON3.10" instead of "python3.10".

Apply this diff to fix the runtime string construction:

         if language == Language.NODEJS:
-            return f"{language}{language_version}.x"
+            return f"{language.value}{language_version}.x"
         elif language == Language.CPP:
             return "provided.al2"
         elif language in [Language.PYTHON]:
-            return f"{language}{language_version}"
+            return f"{language.value}{language_version}"
         else:
             raise NotImplementedError()

146-163: Major: Shell injection risk and incorrect path handling remain unfixed.

Despite past review comments, multiple issues persist:

  1. Line 148: shell=True with untrusted input (S604) is a security risk
  2. Line 161: Double path join os.path.join(directory, benchmark_archive) where benchmark_archive already includes the directory prefix for C++ (line 158)

For the Python/Node.js path (lines 146-154), replace shell execution with Python's zipfile:

-            # FIXME: use zipfile
-            # create zip with hidden directory but without parent directory
-            execute("zip -qu -r9 {}.zip * .".format(benchmark), shell=True, cwd=directory)
-            benchmark_archive = "{}.zip".format(os.path.join(directory, benchmark))
+            benchmark_archive = os.path.join(directory, f"{benchmark}.zip")
+            from zipfile import ZipFile, ZIP_DEFLATED
+            with ZipFile(benchmark_archive, "w", compression=ZIP_DEFLATED) as zf:
+                for entry in os.listdir(directory):
+                    if entry != os.path.basename(benchmark_archive):
+                        entry_path = os.path.join(directory, entry)
+                        if os.path.isfile(entry_path):
+                            zf.write(entry_path, arcname=entry)
             self.logging.info("Created {} archive".format(benchmark_archive))
 
-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)

For the C++ path (lines 156-163), fix the duplicate path join:

         elif language == Language.CPP:
             # lambda C++ runtime build scripts create the .zip file in build directory
             benchmark_archive = os.path.join(directory, "build", "benchmark.zip")
             self.logging.info("Created {} archive".format(benchmark_archive))
 
-            bytes_size = os.path.getsize(os.path.join(directory, benchmark_archive))
+            bytes_size = os.path.getsize(benchmark_archive)

136-144: Major: Self-move risk in function directory creation.

Lines 139-144 can attempt to move the function directory into itself, causing a crash on re-pack.

Apply this diff:

-            os.makedirs(function_dir)
+            os.makedirs(function_dir, exist_ok=True)
             # move all files to 'function' except handler.py
             for file in os.listdir(directory):
-                if file not in package_config:
+                if file not in package_config and file != "function":
                     file = os.path.join(directory, file)
                     shutil.move(file, function_dir)
sebs/benchmark.py (1)

422-469: Critical: Logging statement has syntax error.

Line 460-463 mixes f-string syntax with string concatenation incorrectly, causing a runtime error.

Apply this diff to fix:

         self.logging.info(
-            f"CPP benchmark {self.benchmark} has "
-            + str(len(self._benchmark_config._cpp_dependencies))
-            + " dependencies."
+            f"CPP benchmark {self.benchmark} has "
+            f"{len(self._benchmark_config._cpp_dependencies)} dependencies."
         )

Additionally, consider making the hardcoded source file list at line 429-431 configurable via the benchmark's config.json for flexibility with different file structures.

🧹 Nitpick comments (4)
sebs/cache.py (2)

382-395: Consider using explicit key checks instead of try/except for control flow.

Using exception handling for control flow is generally discouraged in Python. Consider using explicit checks:

-            package_exists = True
-            try:
-                config[deployment_name][language][main_key][key]
-            except KeyError:
-                package_exists = False
-
-                """
-                    We have no such cache entry - fallback.
-                    However, we still have directory, a possible leftover after crash.
-                    Whatever was left, we remove it since we have no information what is there.
-                """
-                if os.path.exists(cached_dir):
-                    shutil.rmtree(cached_dir)
+            package_exists = (
+                deployment_name in config
+                and language in config.get(deployment_name, {})
+                and main_key in config.get(deployment_name, {}).get(language, {})
+                and key in config.get(deployment_name, {}).get(language, {}).get(main_key, {})
+            )
+
+            if not package_exists:
+                """
+                    We have no such cache entry - fallback.
+                    However, we still have directory, a possible leftover after crash.
+                    Whatever was left, we remove it since we have no information what is there.
+                """
+                if os.path.exists(cached_dir):
+                    shutil.rmtree(cached_dir)

396-434: Logic is correct; consider extracting shared code copying logic.

The update path correctly handles both code packages and containers, with appropriate metadata updates. The logic for copying code directories/ZIP files (lines 398-416) is duplicated from add_code_package. While acceptable for this PR, consider extracting this to a private helper method in a follow-up to reduce duplication.

sebs/faas/container.py (1)

217-227: Improve error logging in exception handler.

The error handling correctly logs build failures, but per Ruff TRY400, use logging.exception instead of logging.error in the except block to automatically include stack traces.

Apply this diff:

         except docker.errors.BuildError as e:
-            self.logging.error("Docker build failed!")
+            self.logging.exception("Docker build failed!")
 
             for chunk in e.build_log:
                 if "stream" in chunk:
-                    self.logging.error(chunk["stream"])
+                    self.logging.exception(chunk["stream"])
-            raise e
+            raise
sebs/cpp_dependencies.py (1)

88-92: Optional: Consider custom exception class for consistency.

Ruff suggests creating a custom exception class instead of raising generic Exception with a message. While not critical, this would improve consistency with Python best practices.

Consider creating a custom exception:

+class UnknownDependencyError(ValueError):
+    """Raised when an unknown C++ dependency type is encountered."""
+    pass
+
 @staticmethod
 def deserialize(val: str) -> CppDependencies:
     for member in CppDependencies:
         if member.value == val:
             return member
-    raise Exception(f"Unknown C++ dependency type {val}")
+    raise UnknownDependencyError(val)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6d5e0e and 27e053d.

📒 Files selected for processing (9)
  • config/systems.json (1 hunks)
  • dockerfiles/aws/cpp/Dockerfile.function (1 hunks)
  • sebs/aws/aws.py (6 hunks)
  • sebs/benchmark.py (19 hunks)
  • sebs/cache.py (2 hunks)
  • sebs/cpp_dependencies.py (1 hunks)
  • sebs/faas/container.py (9 hunks)
  • sebs/faas/system.py (7 hunks)
  • sebs/regression.py (6 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • config/systems.json
  • sebs/regression.py
  • sebs/benchmark.py
  • dockerfiles/aws/cpp/Dockerfile.function
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • config/systems.json
  • sebs/regression.py
📚 Learning: 2025-06-19T16:15:01.811Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.

Applied to files:

  • sebs/regression.py
🧬 Code graph analysis (6)
sebs/regression.py (2)
sebs/sebs.py (2)
  • get_deployment (83-142)
  • logging_filename (40-41)
sebs/aws/aws.py (1)
  • initialize (82-94)
sebs/cache.py (1)
sebs/benchmark.py (7)
  • code_package (114-115)
  • code_location (122-132)
  • language_version (168-169)
  • architecture (184-185)
  • container_deployment (188-189)
  • language (160-161)
  • code_size (151-152)
sebs/cpp_dependencies.py (1)
sebs/benchmark.py (1)
  • deserialize (69-80)
sebs/faas/container.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/config.py (1)
  • benchmark_base_images (60-65)
sebs/aws/aws.py (4)
sebs/types.py (1)
  • Language (25-38)
sebs/faas/system.py (1)
  • container_client (71-72)
sebs/aws/container.py (1)
  • ECRContainer (13-79)
sebs/benchmark.py (4)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • benchmark (102-103)
sebs/faas/system.py (3)
sebs/benchmark.py (11)
  • benchmark (102-103)
  • Benchmark (96-963)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • is_cached (135-136)
  • is_cached (139-140)
  • container_deployment (188-189)
  • container_uri (155-157)
  • build (660-807)
  • code_location (122-132)
sebs/faas/container.py (1)
  • DockerContainer (16-236)
sebs/types.py (1)
  • Language (25-38)
🪛 Hadolint (2.14.0)
dockerfiles/aws/cpp/Dockerfile.function

[error] 6-6: unexpected '{'
expecting '#', '', ADD, ARG, CMD, COPY, ENTRYPOINT, ENV, EXPOSE, FROM, HEALTHCHECK, LABEL, MAINTAINER, ONBUILD, RUN, SHELL, STOPSIGNAL, USER, VOLUME, WORKDIR, a pragma, at least one space, or end of input

(DL1000)

🪛 Ruff (0.14.7)
sebs/cpp_dependencies.py

92-92: Create your own exception

(TRY002)


92-92: Avoid specifying long messages outside the exception class

(TRY003)


100-100: Avoid specifying long messages outside the exception class

(TRY003)

sebs/faas/container.py

222-222: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


226-226: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


227-227: Use raise without specifying exception name

Remove exception name

(TRY201)

sebs/aws/aws.py

126-126: Unused method argument: language_version

(ARG002)


127-127: Unused method argument: architecture

(ARG002)


129-129: Unused method argument: is_cached

(ARG002)


148-148: Function call with shell=True parameter identified, security issue

(S604)

sebs/benchmark.py

268-268: Probable use of insecure hash functions in hashlib: md5

(S324)


652-652: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (17)
sebs/cache.py (1)

250-270: LGTM! Clean handling of container vs code package caching.

The conditional logic correctly differentiates between code packages (with a local code location) and container deployments (where code is pushed directly to a registry). Using relative paths for cached locations is a good practice for cache portability.

dockerfiles/aws/cpp/Dockerfile.function (1)

5-6: Static analysis warning is a false positive for template placeholders.

The Hadolint warning about unexpected { characters on line 6 is expected and can be safely ignored. The {{DEPENDENCY_IMAGES}} and {{DEPENDENCY_COPIES}} placeholders are intentional template markers that will be dynamically replaced by sebs/cpp_dependencies.py::CppDependencies.generate_dockerfile() during the build process. This templating approach enables selective inclusion of only the required C++ dependencies for each benchmark.

Also applies to: 20-21

config/systems.json (1)

125-144: LGTM! C++ language configuration is well-structured.

The C++ configuration for AWS is properly aligned with the broader PR changes:

  • Base images correctly reference Amazon Linux 2 Lambda runtime
  • Dependency list matches the CppDependencies enum (runtime, sdk, boost, hiredis, opencv, igraph, torch)
  • Deployment files comprehensively include all necessary C++ wrappers and headers

Based on learnings, C++ support is correctly scoped to AWS only.

sebs/regression.py (5)

31-37: C++ benchmark list properly initialized.

The CPP benchmark list includes appropriate benchmarks that have C++ implementations (sleep, thumbnailer, image-recognition, graph-pagerank, graph-bfs), consistent with the project's C++ support scope.


155-159: Good addition of per-test metadata attributes.

Storing test_architecture, test_deployment_type, and test_benchmark as function attributes enables precise filtering later in the test discovery phase. This is cleaner than parsing test names.


216-236: AWSTestSequenceCpp follows established patterns.

The implementation is consistent with AWSTestSequencePython and AWSTestSequenceNodejs, using the same lock-based initialization and logging approach.


528-536: Consistent use of test metadata for filtering.

The refactored filtering correctly uses the new test_architecture and test_deployment_type attributes instead of parsing test names, improving maintainability.


454-462: The cpp+package exclusion for 411.image-recognition is intentional and justified.

The benchmark's C++ dependencies (torch and opencv) are heavy compiled libraries. AWS Lambda package deployments have strict size limits (~500MB), while container deployments can scale to ~10GB. The Python package.sh script explicitly measures and manages package sizes, including conditional logic to strip and compress torch for specific Python versions (3.8/3.9), confirming that the developers actively address size constraints. The exclusion for cpp+package deployments prevents deployment failures due to package size limits while container deployments can accommodate the larger compiled libraries.

sebs/faas/container.py (1)

132-196: Enhanced build_base_image signature and custom Dockerfile support.

The updated signature correctly:

  1. Accepts Language enum instead of string for type safety
  2. Adds builder_image parameter for the build stage base
  3. Returns size in MB as third tuple element for metrics

The custom Dockerfile detection (lines 176-185) is essential for C++ benchmarks, which require dependency-specific Dockerfile generation via CppDependencies.generate_dockerfile().

sebs/faas/system.py (3)

70-72: Well-designed extension points for container builds.

The container_client property and finalize_container_build method provide clean hooks for platform-specific container build logic without breaking the base abstraction. AWS overrides container_client to return its ECRContainer, while other platforms return None.

Also applies to: 184-187


173-181: API signature consistently updated for Language enum.

The package_code signature change from language_name: str to language: Language strengthens type safety and aligns with the enum-based language handling throughout the codebase. The simplified return type Tuple[str, int] (removing container_uri) is cleaner since container builds are now handled separately.


318-323: Improved logging clarity for container vs package deployments.

The refactored logging (lines 318-323) clearly distinguishes between cached container functions (with container URI) and package functions (with code location), improving debugging experience.

sebs/cpp_dependencies.py (2)

154-199: Dockerfile generation logic correctly handles wildcards and dependencies.

The generate_dockerfile method properly:

  1. Computes required dependencies including core deps (SDK, RUNTIME, BOOST, HIREDIS)
  2. Generates FROM statements with stage aliases matching dependency values
  3. Handles both wildcard paths (line 188-190) and exact paths (line 192) for COPY statements
  4. Correctly splits wildcard paths to get destination directories

The wildcard handling is particularly important for libraries like libaws* that expand to multiple files.


94-122: CMake integration is well-structured.

The to_cmake_list method generates proper CMake commands:

  • find_package with REQUIRED flag
  • Conditional target_include_directories when cmake_dir is set
  • target_link_libraries with appropriate library variables

The use of CMake variables like ${AWSSDK_LINK_LIBRARIES} and ${TORCH_LIBRARIES} follows CMake best practices.

sebs/benchmark.py (3)

267-295: Language enum integration in hash computation is correct.

The hash_directory method properly:

  1. Accepts Language enum parameter for type safety
  2. Uses Language enum values as dictionary keys (lines 270-273, 275-278)
  3. Extracts string value with language.value for path construction (line 289)

This ensures hash consistency across the codebase.


660-807: Build method refactoring correctly separates container and package paths.

The refactored build logic properly:

  1. Returns 4-tuple (rebuilt, code_location, container_deployment, container_uri) for explicit path tracking
  2. Generates custom Dockerfiles for C++ container builds (lines 720-740)
  3. Calls container build via container_client.build_base_image (lines 742-750)
  4. Handles OpenWhisk's requirement for both container and package (lines 762-770)
  5. Falls back to package-only path for non-container deployments (lines 772-793)

The separation between container and package build paths is much cleaner than the previous implementation.


490-501: Good extraction of builder image name logic.

The new builder_image_name helper method reduces duplication between install_dependencies and build methods. Returning both versioned and unversioned names supports the fallback pattern at lines 532-545.

Comment on lines +404 to +409
if os.path.isdir(code_package.code_location):
cached_location = os.path.join(cached_dir, "code")
# could be replaced with dirs_exists_ok in copytree
# available in 3.8
shutil.rmtree(cached_location)
shutil.copytree(src=code_package.code_location, dst=cached_location)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential FileNotFoundError when cached_location doesn't exist.

shutil.rmtree(cached_location) is called unconditionally, but the directory may not exist (e.g., if the package was cached but the code directory was never created, or after a partial cleanup). This will raise FileNotFoundError.

Apply this diff to add an existence check:

                     if os.path.isdir(code_package.code_location):
                         cached_location = os.path.join(cached_dir, "code")
                         # could be replaced with dirs_exists_ok in copytree
                         # available in 3.8
-                        shutil.rmtree(cached_location)
+                        if os.path.exists(cached_location):
+                            shutil.rmtree(cached_location)
                         shutil.copytree(src=code_package.code_location, dst=cached_location)

Alternatively, if Python 3.8+ is the minimum supported version, you can use dirs_exist_ok=True as noted in the comment:

                     if os.path.isdir(code_package.code_location):
                         cached_location = os.path.join(cached_dir, "code")
-                        # could be replaced with dirs_exists_ok in copytree
-                        # available in 3.8
-                        shutil.rmtree(cached_location)
-                        shutil.copytree(src=code_package.code_location, dst=cached_location)
+                        shutil.copytree(src=code_package.code_location, dst=cached_location, dirs_exist_ok=True)
🤖 Prompt for AI Agents
In sebs/cache.py around lines 404 to 409, calling shutil.rmtree(cached_location)
unconditionally can raise FileNotFoundError when cached_location doesn't exist;
update the code to check for existence before removing (e.g., if
os.path.exists(cached_location): shutil.rmtree(cached_location)) or, if Python
3.8+ is guaranteed, use shutil.copytree(..., dirs_exist_ok=True) and remove the
unconditional rmtree call so the copy will overwrite/create as needed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
sebs.py (2)

163-169: Use context manager for file reading.

The file is opened without a context manager, which could lead to resource leaks if an exception occurs during JSON parsing.

     if storage_configuration is not None:
 
         for cfg_f in storage_configuration:
             sebs_client.logging.info(f"Loading storage configuration from {cfg_f}")
 
-            cfg = json.load(open(cfg_f, 'r'))
+            with open(cfg_f, 'r') as f:
+                cfg = json.load(f)
             append_nested_dict(config_obj, ["deployment", deployment, "storage"], cfg)

360-372: Unused benchmark_input_size argument.

The benchmark_input_size CLI argument is captured but never used. Currently, the regression_suite hardcodes size="test" (line 110 in regression.py). Should this CLI argument override the test size?

Additionally, prefix unused unpacked variables with underscore to satisfy linters:

-def regression(benchmark_input_size, benchmark_name, storage_configuration, **kwargs):
+def regression(benchmark_input_size, benchmark_name, storage_configuration, **kwargs):
     # for regression, deployment client is initialized locally
     # disable default initialization
-    (config, output_dir, logging_filename, sebs_client, _) = parse_common_params(
+    (config, _output_dir, _logging_filename, sebs_client, _) = parse_common_params(
         initialize_deployment=False, storage_configuration=storage_configuration, **kwargs
     )
sebs/regression.py (1)

217-237: Consider extracting common AWS deployment logic.

The get_deployment method is nearly identical across AWSTestSequencePython, AWSTestSequenceNodejs, and now AWSTestSequenceCpp. The only difference is the class-level lock reference.

This follows the existing pattern in the codebase, so acceptable for this PR. A future refactor could introduce a base AWSTestSequenceBase class with a shared get_deployment implementation.

config/nodejs.json (1)

1-101: Node.js config looks good; consider deduplicating shared experiment schema

The Node.js invocation‑overhead config is consistent with the C++/Python configs and uses sensible defaults for arm64/Node 16. If you expect these three files to stay in lockstep, you might later factor the common experiment/deployment structure into a shared base and keep only language‑specific differences here, to avoid drift.

sebs/openwhisk/openwhisk.py (1)

104-139: Tighten package_code robustness and make language support explicit

The refactor to use Language and return just (archive, size) looks good and fits the shared interface, and limiting CONFIG_FILES to Python/Node.js matches the fact that OpenWhisk doesn’t support C++ here. To make this a bit more robust and clearer:

  • Guard against unsupported languages explicitly instead of relying on a KeyError, e.g.:
-        CONFIG_FILES = {
-            Language.PYTHON: ["__main__.py"],
-            Language.NODEJS: ["index.js"],
-        }
-        package_config = CONFIG_FILES[language]
+        CONFIG_FILES = {
+            Language.PYTHON: ["__main__.py"],
+            Language.NODEJS: ["index.js"],
+        }
+        try:
+            package_config = CONFIG_FILES[language]
+        except KeyError:
+            raise NotImplementedError(f"OpenWhisk packaging not implemented for {language}")
  • Improve the zip subprocess call both for style and failure handling:
-        subprocess.run(
-            ["zip", benchmark_archive] + package_config,
-            stdout=subprocess.DEVNULL,
-            cwd=directory,
-        )
+        subprocess.run(
+            ["zip", benchmark_archive, *package_config],
+            stdout=subprocess.DEVNULL,
+            cwd=directory,
+            check=True,
+        )

This addresses the Ruff hint about list concatenation and ensures a failed zip will surface immediately instead of proceeding with a missing/corrupt archive.

  • If Ruff’s ARG002 warnings about language_version, architecture, and is_cached are noisy but you need to keep the unified signature, consider adding a short comment and # noqa: ARG002 on the definition, or using them in a no‑op tuple assignment, to document that they are intentionally unused for OpenWhisk.

These are all small robustness/clarity tweaks; the overall design and the fact that C++ is intentionally unsupported for OpenWhisk look correct.

Based on learnings, OpenWhisk is expected to support only Python and Node.js here, not C++.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27e053d and 2c206a6.

📒 Files selected for processing (7)
  • config/cpp.json (1 hunks)
  • config/nodejs.json (1 hunks)
  • config/python.json (1 hunks)
  • docs/storage.md (2 hunks)
  • sebs.py (5 hunks)
  • sebs/openwhisk/openwhisk.py (6 hunks)
  • sebs/regression.py (7 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • sebs/regression.py
  • sebs/openwhisk/openwhisk.py
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • sebs/openwhisk/openwhisk.py
🧬 Code graph analysis (3)
sebs.py (2)
benchmarks/wrappers/azure/python/storage.py (1)
  • storage (8-59)
benchmarks/wrappers/local/python/storage.py (1)
  • storage (7-57)
sebs/regression.py (2)
sebs/faas/function.py (2)
  • Trigger (178-268)
  • TriggerType (179-189)
sebs/sebs.py (2)
  • get_deployment (83-142)
  • logging_filename (40-41)
sebs/openwhisk/openwhisk.py (3)
sebs/types.py (1)
  • Language (25-38)
sebs/openwhisk/container.py (1)
  • OpenWhiskContainer (9-46)
sebs/aws/aws.py (2)
  • config (46-47)
  • package_code (122-168)
🪛 markdownlint-cli2 (0.18.1)
docs/storage.md

130-130: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.7)
sebs.py

360-360: Unused function argument: benchmark_input_size

(ARG001)


363-363: Unpacked variable output_dir is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


363-363: Unpacked variable logging_filename is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

sebs/openwhisk/openwhisk.py

108-108: Unused method argument: language_version

(ARG002)


109-109: Unused method argument: architecture

(ARG002)


111-111: Unused method argument: is_cached

(ARG002)


123-123: subprocess call: check for execution of untrusted input

(S603)


124-124: Consider ["zip", benchmark_archive, *package_config] instead of concatenation

Replace with ["zip", benchmark_archive, *package_config]

(RUF005)

🔇 Additional comments (9)
sebs.py (2)

67-67: LGTM!

Adding "cpp" to language choices aligns with the PR objective to support C++ benchmarks.


455-477: LGTM!

Good defensive improvement. The guards prevent KeyError when configuration sections are missing, which aligns with the selective section handling in storage_start.

sebs/regression.py (5)

16-38: LGTM!

Benchmark lists are appropriately extended. The benchmarks_cpp list correctly includes benchmarks that have C++ implementations, and "010.sleep" is now consistently available across all languages.


156-160: LGTM!

Clean approach for attaching per-test metadata. This enables proper filtering based on architecture and deployment type without parsing test names.


445-473: LGTM!

The filtering logic correctly:

  1. Excludes all C++ tests for arm64 architecture (line 455-456)
  2. Excludes 411.image-recognition for C++ package deployments (lines 462-463)

This aligns with the retrieved learnings that C++ support is limited to AWS. Based on learnings, C++ is not supported on Azure, GCP, or OpenWhisk.


496-497: LGTM!

C++ test sequence correctly added only for AWS provider. Based on learnings, C++ support is limited to AWS Lambda and is intentionally not available for Azure, GCP, or OpenWhisk.


529-540: LGTM!

Proper usage of the metadata attributes attached during test generation. This enables accurate per-test filtering.

config/cpp.json (1)

1-101: C++ experiment config aligns with AWS-only support

The schema matches the other language configs and correctly pins experiments.deployment and deployment.name to aws while keeping other providers as placeholders. Given that C++ is only supported on AWS in this framework, this separation looks appropriate; just avoid wiring C++ experiments to Azure/OpenWhisk until support is added.

Based on learnings, C++ support is currently limited to AWS only.

config/python.json (1)

1-101: Verify Python architecture token matches what the system expects

Here architecture is set to "x64", while other configs use "arm64". Please double‑check that "x64" is one of the accepted architecture identifiers in your deployment and packaging logic (e.g., vs "x86_64" or other canonical values) so experiments don’t silently misclassify or fall back.

Comment on lines +130 to 131
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove erroneous code block syntax.

Lines 130–131 open and close a code block without specifying a language, creating invalid markdown. This appears to be unintentional. The bash code block should close at line 129.

Apply this diff to remove the erroneous code block:

 healthy: 192.168.0.20:9012
-```
-```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
healthy: 192.168.0.20:9012
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

130-130: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/storage.md around lines 130 to 131, there is an extraneous code-block
closing/backtick sequence that opens and closes an empty code block (invalid
markdown) after the intended bash block; remove the stray three backticks on
lines 130–131 so the bash code block properly ends at line 129 and no
empty/invalid code block remains.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sebs/benchmark.py (1)

904-913: Fix inverted logic in code_package_modify – currently breaks zip-based deployments.

The new implementation:

def code_package_modify(self, filename: str, data: bytes):
    if self.container_deployment or not self.code_package_is_archive():
        assert self.code_location is not None
        self._update_zip(self.code_location, filename, data)
        ...
    else:
        raise NotImplementedError()

modifies the package when either:

  • container_deployment is True, or
  • the package is not an archive,

and raises NotImplementedError for the normal “zip-based deployment” case (non-container + archive). This is the opposite of the intended behavior and will prevent updates to standard zip packages (e.g., AWS zip functions) while attempting to treat container or directory-based packages as zips.

A safer implementation is:

     def code_package_modify(self, filename: str, data: bytes):
-        if self.container_deployment or not self.code_package_is_archive():
+        # Only support modifying zip-based, non-container code packages.
+        if (not self.container_deployment) and self.code_package_is_archive():
             assert self.code_location is not None
             self._update_zip(self.code_location, filename, data)
             new_size = self.code_package_recompute_size() / 1024.0 / 1024.0
             self.logging.info(
                 f"Modified zip package {self.code_location}, new size {new_size} MB"
             )
         else:
-            raise NotImplementedError()
+            raise NotImplementedError(
+                "Code package modification is only supported for zip-based deployments."
+            )
♻️ Duplicate comments (1)
sebs/aws/aws.py (1)

122-169: AWS package_code behavior is fine; shell-based zip is the main remaining concern.

The split between Python/Node.js (local zip creation) and C++ (prebuilt build/benchmark.zip) packaging looks consistent, and returning (benchmark_archive, bytes_size) matches the new interface.

The execute("zip -qu -r9 ...", shell=True, ...) call relies only on maintainer-controlled inputs here, so it’s low risk, but moving to zipfile.ZipFile would avoid shelling out and address the S604 warning. This has been suggested before and can be picked up when you next touch this path.

🧹 Nitpick comments (10)
sebs/local/local.py (2)

114-142: Compute package size from directory contents, not the directory inode.

bytes_size = os.path.getsize(directory) returns the size of the directory entry, not the total size of the packaged code, so the logged “Function size” is misleading. Using the existing Benchmark.directory_size helper here would give a more accurate value.

-        bytes_size = os.path.getsize(directory)
+        from sebs.benchmark import Benchmark
+        bytes_size = Benchmark.directory_size(directory)

282-307: Local correctly forbids container deployment but leaves container_uri unused.

The explicit NotImplementedError when container_deployment is True keeps Local aligned with the stated “no container deployment” support, which is good. Since container_uri isn’t used in either create_function or update_function, you may want to rename it to _container_uri to quiet linters, but behavior is otherwise fine.

sebs/openwhisk/openwhisk.py (3)

104-139: Enum-based packaging for OpenWhisk is fine; subprocess zip is acceptable but could be simplified.

Using Language.PYTHON / Language.NODEJS keys and package_config = CONFIG_FILES[language] is consistent with the new Language enum model. The subprocess.run(["zip", benchmark_archive] + package_config, ...) call operates only on maintainer-controlled paths, so it’s acceptable, though you could switch to ["zip", benchmark_archive, *package_config] for clarity or to Python’s zipfile in the future.


173-185: OpenWhisk now explicitly requires container deployment.

The guard:

if not container_deployment:
    raise RuntimeError("Non-container deployment is not supported in OpenWhisk!")

makes the constraint explicit, which is good. Just ensure all call sites pass container_deployment=True for OpenWhisk (and that tests don’t assume zip-only deployments).

Also note that container_uri is accepted but unused here; that’s harmless but could be renamed to _container_uri to appease linters.


273-300: Clarify error messaging in update_function for non-container deployments.

update_function raises when not container_deployment with a message about code location and container deployment. Since code_location is checked separately just below, you could simplify the first message to focus purely on the unsupported non-container mode, leaving the second check to explain missing code_location.

This is cosmetic only; behavior is correct.

sebs/azure/azure.py (1)

119-177: Azure packaging correctly adopts Language and validates code_location.

Using EXEC_FILES / CONFIG_FILES keyed by Language.PYTHON and Language.NODEJS matches the documented lack of C++ support on Azure while keeping the API future-proof. The _mount_function_code check for code_package.code_location is None is a useful early failure to avoid confusing downstream errors.

container_uri being unused in update_function is fine given Azure doesn’t support container deployment; you can prefix it with _ later if tooling noise becomes an issue.

sebs/benchmark.py (4)

28-80: CPP dependency plumbing into BenchmarkConfig looks good; consider unifying the Language import.

Adding cpp_dependencies: Optional[List[CppDependencies]] and deserializing via CppDependencies.deserialize aligns with the new C++ dependency model. The only minor nit is that BenchmarkConfig.deserialize still imports Language from sebs.faas.function while this module already imports Language from sebs.types; consolidating on the latter would avoid type drift across modules.


422-469: C++ CMake packaging works but duplicates some core dependency logic.

add_deployment_package_cpp generates a CMakeLists.txt that:

  • Adds the fixed set of C++ source files.
  • Always links aws-lambda-runtime, AWSSDK, and hiredis via PkgConfig.
  • Appends per-benchmark dependencies via CppDependencies.to_cmake_list.

This is consistent with “AWS-only C++” support and should work, but it partially duplicates the core dependencies (SDK/runtime/BOOST/HIREDIS) that CppDependencies.get_required_dependencies already models. Longer term, consider driving both the Dockerfile and CMake generation from CppDependencies.get_required_dependencies to keep the dependency set in one place.


660-782: build() ignores container_build_step, which may break deferred packaging for container-based deployments.

In the container-deployment branch:

if container_build_step is not None:
    self._code_location, self._code_size = package_build_step(
        os.path.abspath(self._output_dir),
        self.language,
        self.language_version,
        self.architecture,
        self.benchmark,
        self.is_cached_valid,
    )

the container_build_step argument is never used; package_build_step is called instead. That means any provider intending to use a special, deferred packaging step for container deployments (e.g., via a finalize_container_build hook) cannot, and the extra callback is effectively dead.

If the intent was to allow providers like OpenWhisk to plug in a post-container packaging callback, consider:

-        if container_build_step is not None:
-            self._code_location, self._code_size = package_build_step(
+        if container_build_step is not None:
+            self._code_location, self._code_size = container_build_step(
                 os.path.abspath(self._output_dir),
                 self.language,
                 self.language_version,
                 self.architecture,
                 self.benchmark,
                 self.is_cached_valid,
             )

Please verify how build() is invoked for each system and whether any caller currently relies on container_build_step.


650-653: Using logging.error with decoded stderr is fine; logging.exception is optional.

Logging e.stderr.decode() and the Docker volumes on docker.errors.ContainerError is useful for debugging failed dependency builds. If you want stack traces as well, you could switch the first call to self.logging.exception(...), but that’s optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c206a6 and 5a893a8.

📒 Files selected for processing (6)
  • sebs/aws/aws.py (8 hunks)
  • sebs/azure/azure.py (7 hunks)
  • sebs/benchmark.py (19 hunks)
  • sebs/gcp/gcp.py (6 hunks)
  • sebs/local/local.py (5 hunks)
  • sebs/openwhisk/openwhisk.py (10 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • sebs/openwhisk/openwhisk.py
  • sebs/azure/azure.py
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • sebs/openwhisk/openwhisk.py
  • sebs/benchmark.py
  • sebs/azure/azure.py
🧬 Code graph analysis (6)
sebs/openwhisk/openwhisk.py (3)
sebs/types.py (1)
  • Language (25-38)
sebs/openwhisk/container.py (1)
  • OpenWhiskContainer (9-46)
sebs/benchmark.py (4)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • benchmark (102-103)
sebs/aws/aws.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (3)
  • language (160-161)
  • language_version (168-169)
  • benchmark (102-103)
sebs/benchmark.py (3)
sebs/cpp_dependencies.py (4)
  • CppDependencies (22-199)
  • deserialize (88-92)
  • to_cmake_list (95-122)
  • generate_dockerfile (155-199)
sebs/faas/container.py (2)
  • DockerContainer (16-236)
  • name (19-20)
sebs/types.py (1)
  • Language (25-38)
sebs/gcp/gcp.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (8)
  • language (160-161)
  • language_version (168-169)
  • architecture (184-185)
  • benchmark (102-103)
  • is_cached (135-136)
  • is_cached (139-140)
  • code_package (114-115)
  • code_location (122-132)
sebs/local/local.py (3)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (5)
  • language (160-161)
  • architecture (184-185)
  • benchmark (102-103)
  • container_uri (155-157)
  • container_deployment (188-189)
sebs/experiments/config.py (2)
  • architecture (37-38)
  • container_deployment (41-42)
sebs/azure/azure.py (2)
sebs/types.py (1)
  • Language (25-38)
sebs/benchmark.py (4)
  • language (160-161)
  • benchmark (102-103)
  • code_package (114-115)
  • code_location (122-132)
🪛 Ruff (0.14.7)
sebs/openwhisk/openwhisk.py

108-108: Unused method argument: language_version

(ARG002)


109-109: Unused method argument: architecture

(ARG002)


111-111: Unused method argument: is_cached

(ARG002)


123-123: subprocess call: check for execution of untrusted input

(S603)


124-124: Consider ["zip", benchmark_archive, *package_config] instead of concatenation

Replace with ["zip", benchmark_archive, *package_config]

(RUF005)


182-182: Avoid specifying long messages outside the exception class

(TRY003)


224-227: Avoid specifying long messages outside the exception class

(TRY003)


278-278: Unused method argument: container_uri

(ARG002)


281-284: Avoid specifying long messages outside the exception class

(TRY003)


297-300: Avoid specifying long messages outside the exception class

(TRY003)

sebs/aws/aws.py

126-126: Unused method argument: language_version

(ARG002)


127-127: Unused method argument: architecture

(ARG002)


129-129: Unused method argument: is_cached

(ARG002)


148-148: Function call with shell=True parameter identified, security issue

(S604)


338-338: Avoid specifying long messages outside the exception class

(TRY003)

sebs/benchmark.py

268-268: Probable use of insecure hash functions in hashlib: md5

(S324)


652-652: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


936-936: Avoid specifying long messages outside the exception class

(TRY003)

sebs/gcp/gcp.py

129-129: Unused method argument: language_version

(ARG002)


130-130: Unused method argument: architecture

(ARG002)


132-132: Unused method argument: is_cached

(ARG002)


197-197: Avoid specifying long messages outside the exception class

(TRY003)


364-364: Unused method argument: container_uri

(ARG002)


371-371: Avoid specifying long messages outside the exception class

(TRY003)

sebs/local/local.py

118-118: Unused method argument: language_version

(ARG002)


119-119: Unused method argument: architecture

(ARG002)


120-120: Unused method argument: benchmark

(ARG002)


121-121: Unused method argument: is_cached

(ARG002)


287-287: Unused method argument: container_uri

(ARG002)


303-303: Unused method argument: container_uri

(ARG002)

sebs/azure/azure.py

123-123: Unused method argument: language_version

(ARG002)


124-124: Unused method argument: architecture

(ARG002)


126-126: Unused method argument: is_cached

(ARG002)


259-259: Unused method argument: container_uri

(ARG002)


385-385: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (13)
sebs/openwhisk/openwhisk.py (1)

46-76: Container client refactor for OpenWhisk looks consistent.

Using a private _container_client with a typed container_client property keeps this in line with other providers and centralizes container handling cleanly.

sebs/aws/aws.py (2)

53-56: Exposing container_client via a property is a clean API.

The container_client property returning self.ecr_client neatly standardizes container access across systems and keeps the public surface coherent.


320-339: Guarding against missing code packages in update_function is correct.

The explicit:

if package is None:
    raise RuntimeError("Code location is not set for zip deployment")

prevents silent misconfigurations for zip-based deployments and makes failures much easier to diagnose.

sebs/gcp/gcp.py (2)

125-183: GCP packaging correctly shifts to Language and enforces non-container deployment.

Enum-keyed CONFIG_FILES / HANDLER and package_config = CONFIG_FILES[language] are consistent with the new Language model, and renaming handler.py/handler.js to main.py/index.js matches GCP’s expectations. The explicit checks that container deployment is unsupported and that code_location must be set for deployments keep error modes clear.


359-372: Runtime checks for code_location in GCP updates are appropriate.

The guard:

if code_package.code_location is None:
    raise RuntimeError("Code location is not set for GCP deployment")

ensures function updates don’t proceed with an undefined package path, preventing obscure downstream failures during storage upload or patch operations.

sebs/benchmark.py (8)

266-295: Language-aware hashing and wrapper inclusion are sensible.

Extending hash_directory to use Language-keyed FILES/WRAPPERS (including C++ patterns and wrapper files under benchmarks/wrappers/{deployment}/{language}) ensures any language- or deployment-specific code changes invalidate the cache as expected. MD5 is acceptable here since this is purely for change detection, not security.


337-351: C++ file patterns in copy_code are wired correctly.

Including "*.cpp", "*.hpp", and "dependencies.json" for Language.CPP mirrors the hashing logic and ensures all relevant C++ sources and dependency metadata are copied into the build directory.


490-502: builder_image_name cleanly centralizes build image naming.

Returning both an unversioned and SeBS-versioned build image tag based on deployment, language_name, and language_version makes the subsequent Docker logic in install_dependencies and build more readable and consistent.


503-567: Language-aware dependency installation and C++ support look correct.

Using PACKAGE_FILES = {Language.PYTHON: "requirements.txt", Language.NODEJS: "package.json", Language.CPP: "CMakeLists.txt"} and selecting PACKAGE_FILES[self.language] ensures the Docker build images only run when a corresponding dependency file exists. Combined with builder_image_name, this keeps Python/Node.js/C++ handling uniform.


702-782: Container path’s use of package_build_step for OpenWhisk-style code packages should be double-checked.

For container_deployment=True, you both:

  • Build the container image via container_client.build_base_image(...), and
  • Optionally create a code package using package_build_step(...).

That’s fine for AWS (where the extra package is unused), but for providers that require a separate code package in addition to the container (e.g., OpenWhisk), it’s important that the function passed as package_build_step really is the platform-specific packaging routine. Please confirm that OpenWhisk (and any other container-only providers) are wired this way.


795-807: Return tuple from build() matches new packaging/container model.

Returning (True, self._code_location, self._container_deployment, self._container_uri) after updating the cache is consistent with how callers distinguish between package- and container-based flows, and matches the earlier fast-path returns when the cache is valid.


819-877: Input preparation refactor mainly improves clarity.

The updated prepare_input signature and the reformatted allocate_nosql loop:

for (
    name,
    table_properties,
) in self._benchmark_input_module.allocate_nosql().items():

are functionally equivalent to the prior version and keep behavior unchanged. Storage and NoSQL setup still follow the same contract defined by BenchmarkModuleInterface.


919-941: Archive detection and size recomputation helpers are consistent with the new deployment model.

code_package_is_archive and code_package_recompute_size now correctly treat container deployments as non-archives and force an explicit code_location presence for package-based deployments. Once code_package_modify is fixed as above, these helpers will behave as expected.

Comment on lines +495 to +499
billed_time = int(aws_vals["Billed Duration"])
memory = int(aws_vals["Memory Size"])
output.billing.billed_time = billed_time
output.billing.memory = memory
output.billing.gb_seconds = billed_time * memory
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "aws.py" -path "*/aws/*" | head -5

Repository: spcl/serverless-benchmarks

Length of output: 52


🏁 Script executed:

git ls-files | grep -E "(sebs|aws)" | head -20

Repository: spcl/serverless-benchmarks

Length of output: 914


🏁 Script executed:

head -20 sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 611


🏁 Script executed:

wc -l sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 89


🏁 Script executed:

sed -n '485,510p' sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 1210


🏁 Script executed:

rg "gb_seconds" --type py

Repository: spcl/serverless-benchmarks

Length of output: 581


🏁 Script executed:

rg -A 5 "class.*[Bb]illing" sebs/ --type py | head -40

Repository: spcl/serverless-benchmarks

Length of output: 307


🏁 Script executed:

grep -r "Billed Duration" sebs/ --type py

Repository: spcl/serverless-benchmarks

Length of output: 187


🏁 Script executed:

grep -B 5 -A 20 "class ExecutionBilling" sebs/faas/function.py

Repository: spcl/serverless-benchmarks

Length of output: 598


🏁 Script executed:

grep -B 5 -A 10 "Billed Duration\|Memory Size" sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 812


🏁 Script executed:

rg "aws_vals\[" sebs/aws/aws.py -A 1 -B 1 | head -50

Repository: spcl/serverless-benchmarks

Length of output: 860


🏁 Script executed:

rg "Billed Duration|Duration" sebs/ tests/ --type py -B 2 -A 2 | head -60

Repository: spcl/serverless-benchmarks

Length of output: 749


🏁 Script executed:

find . -name "*.py" -path "*/test*" | xargs grep -l "billing\|gb_seconds" 2>/dev/null

Repository: spcl/serverless-benchmarks

Length of output: 104


🏁 Script executed:

cat tests/aws/invoke_function_sdk.py

Repository: spcl/serverless-benchmarks

Length of output: 2879


🏁 Script executed:

find . -type f \( -name "*.md" -o -name "*.rst" -o -name "*.txt" \) | xargs grep -i "billed duration\|memory size" 2>/dev/null

Repository: spcl/serverless-benchmarks

Length of output: 192


🏁 Script executed:

rg "Duration.*second|second.*Duration" --type py -i

Repository: spcl/serverless-benchmarks

Length of output: 191


🌐 Web query:

AWS Lambda CloudWatch logs "Billed Duration" units milliseconds seconds

💡 Result:

Billed Duration is reported in milliseconds. [1][2]

Sources:

  • AWS Lambda pricing (duration billed rounded to nearest 1 ms). [1]
  • AWS announcement about 1 ms billing granularity (Dec 2020). [2]

🏁 Script executed:

rg "/ 1000|/ 1024|conversion" sebs/aws/ sebs/faas/ --type py -B 1 -A 1 | head -40

Repository: spcl/serverless-benchmarks

Length of output: 965


🏁 Script executed:

grep -B 3 -A 3 "int(float" sebs/aws/aws.py

Repository: spcl/serverless-benchmarks

Length of output: 574


🏁 Script executed:

python3 << 'EOF'
# Example: 1000 ms (1 second), 128 MB memory
billed_time_ms = 1000
memory_mb = 128

# Current incorrect calculation
incorrect = billed_time_ms * memory_mb
print(f"Current (incorrect): {billed_time_ms} ms * {memory_mb} MB = {incorrect}")
print(f"  Units: ms × MB (dimensionally wrong)")

# Correct calculation
correct = (billed_time_ms / 1000.0) * (memory_mb / 1024.0)
print(f"\nCorrect: ({billed_time_ms} ms / 1000) * ({memory_mb} MB / 1024) = {correct}")
print(f"  Units: seconds × GB = GB-seconds")
print(f"  As int: {int(correct)}")
EOF

Repository: spcl/serverless-benchmarks

Length of output: 156


Fix gb_seconds calculation to convert milliseconds and megabytes to GB-seconds.

The current computation multiplies milliseconds by megabytes, which has incorrect units. AWS billing uses GB-seconds, requiring conversion from the extracted values:

billed_time = int(aws_vals["Billed Duration"])  # in milliseconds
memory = int(aws_vals["Memory Size"])  # in megabytes
output.billing.gb_seconds = billed_time * memory  # wrong: ms × MB

Should be:

billed_time_ms = int(aws_vals["Billed Duration"])
memory_mb = int(aws_vals["Memory Size"])
output.billing.billed_time = billed_time_ms
output.billing.memory = memory_mb
output.billing.gb_seconds = int((billed_time_ms / 1000.0) * (memory_mb / 1024.0))
🤖 Prompt for AI Agents
In sebs/aws/aws.py around lines 495 to 499, the gb_seconds calculation
multiplies billed time (milliseconds) by memory (megabytes) producing incorrect
units; convert milliseconds to seconds and megabytes to gigabytes before
multiplying. Set billed_time and memory from aws_vals as integers, then compute
gb_seconds as int((billed_time / 1000.0) * (memory / 1024.0)) and assign those
values to output.billing.billed_time, output.billing.memory, and
output.billing.gb_seconds accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.circleci/config.yml (2)

54-56: Add C++ Docker image builds to test-aws job.

This PR adds C++ benchmark support, but the test-aws job only pulls and caches Python 3.7 and Node.js 12.x Docker images. No C++ images are being built or tested in the CI pipeline.

Add Docker image pulling and saving for C++ builds. For example:

              docker pull mcopik/serverless-benchmarks:build.aws.python.3.7
              docker pull mcopik/serverless-benchmarks:build.aws.nodejs.12.x
+             docker pull mcopik/serverless-benchmarks:build.aws.cpp

Alternatively, if C++ images should be built from Dockerfiles in this repository, ensure they are built and cached in this step.


1-41: Add C++ build and linting jobs to the CI pipeline.

The PR adds C++ benchmarks with CMake builds, but there are no C++-specific jobs in this configuration to:

  1. Build C++ functions with CMake
  2. Verify C++ code formatting/linting
  3. Test the generated Docker containers

Consider adding a dedicated C++ build job that:

  • Installs CMake and C++ compiler dependencies
  • Builds C++ benchmarks
  • Validates Docker container generation
  • Verifies the base images (AWS SDK, Boost, Redis) are correctly configured

Would you like me to suggest a complete C++ build job configuration based on your CMake setup and Dockerfile structure?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a893a8 and 2b1aa03.

📒 Files selected for processing (2)
  • .circleci/config.yml (1 hunks)
  • README.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
🔇 Additional comments (3)
README.md (1)

85-85: Verify Python 3.10+ requirement is necessary.

The Python version bump from 3.7+ to 3.10+ is a breaking change affecting users on Python 3.7–3.9. While this aligns with the broader codebase changes (Language enum, type system), it's unclear whether this version bump is strictly required for C++ support or driven by other dependencies.

Please clarify: Is 3.10+ a hard requirement for the new C++ functionality, or can support for earlier Python versions be maintained? If this is a hard requirement, consider documenting the reason in the README or a migration guide.

.circleci/config.yml (2)

8-10: Verify Python 3.10 support for all dependencies.

Pinning the linting job to Python 3.10 explicitly improves reproducibility. However, ensure that all dependencies (especially AWS SDK, Boost, and Redis libraries being installed on line 21) support Python 3.10.


79-83: Include test-aws job in the main workflow.

The main workflow (lines 79-83) only runs the linting job. The test-aws job is defined but never executed. This means integration tests are not running in the CI pipeline.

Verify whether test-aws should be included in the workflow:

  main:
    jobs:
      - linting
+     - test-aws

Also consider whether the test-aws job should now be renamed to test-aws-python or if a new test-aws-cpp job should be added alongside it.


orbs:
python: circleci/python@1.4.0
python: circleci/python@2.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

circleci python orb 2.1 breaking changes compared to 1.4.0

💡 Result:

Short summary of likely breaking changes when moving circleci/python from 1.4.0 → 2.1

  • Major-version semantics: 2.x is a breaking release — expect incompatible changes to commands/parameters/executors. [1]
  • Default images / Python versions bumped (newer cimg images); some orb steps now run on images that enforce PEP 668 (can break orb commands that try to pip-install into system-managed envs). Update any steps that assume older Python image behavior. [2]
  • Migration from Python wrapper-scripts to shell-based implementations for some orbs/commands (done to avoid PEP 668 issues) — this can change how environment/exit codes/outputs behave and may rename parameters. [2]
  • Executors, command names or parameters may have been renamed or removed (minor/patch should be compatible, but major bumps may remove elements present in 1.4.0). Test and verify usages in your config. [1][3]

Action items

  • Inspect the circleci/python orb page and its changelog/versions in the Orb Registry for exact diffs and element-level changes. [3]
  • Run circleci orb source circleci/python@1.4.0 and circleci/python@2.1 (or view the registry source) and diff commands/jobs/executors used by your config.
  • Test in a branch and pin the orb to an exact semver (e.g. circleci/python@2.1.0) before rolling out.

Sources

  • Orb SemVer / breaking-change guidance. [1]
  • “externally-managed-environment” / PEP 668 note and migration to shell scripts. [2]
  • circleci/python orb registry page (versions & docs). [3]

🏁 Script executed:

# Check the full CircleCI config to understand Python version usage and executor configuration
cat -n .circleci/config.yml | head -100

Repository: spcl/serverless-benchmarks

Length of output: 3258


🏁 Script executed:

# Check if there are any CI logs or test results that show the orb 2.1 is working
git log --oneline --all -20 | head -20

Repository: spcl/serverless-benchmarks

Length of output: 977


🌐 Web query:

circleci python orb 2.1 default Python version executor

💡 Result:

The python/default executor in the circleci/python 2.x orb defaults to Python 3.8 (the executor has a parameter tag with default "3.8"). You can override it by declaring the executor with a tag, e.g.:

executor:
name: python/default
tag: "3.11"

[1][2]

Sources

  • CircleCI Discuss showing executor definition (default tag '3.8'). [1]
  • CircleCI docs / Python orb usage (shows using python/default executor and overriding via tag). [2]

Add explicit Python version to test-aws executor to match linting job.

The Python orb 2.1 introduces breaking changes from 1.4.0, including a bump to newer default images with PEP 668 enforcement. The linting job correctly specifies Python 3.10 explicitly, but the test-aws job uses python/default without a tag parameter (line 42), which defaults to Python 3.8 in orb 2.1. Add an explicit tag: '3.10' to the test-aws executor to ensure consistency and avoid unexpected behavior from implicit version differences.

🤖 Prompt for AI Agents
In .circleci/config.yml around lines 4 to 4 (and the test-aws executor usage
near line 42), the test-aws job relies on the orb's default Python image which
in python orb v2.1 defaults to 3.8 and can cause mismatches with the linting
job; update the test-aws executor declaration to explicitly set the python tag
to '3.10' (e.g., add tag: '3.10' under the python executor) so both jobs use the
same Python 3.10 image and avoid PEP 668/behavior differences from the orb
default.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
docs/build.md (1)

56-56: Replace the weak intensifier "very" with a more precise adjective.

The phrase "very small size limit" relies on an over-used intensifier. Consider using more descriptive language such as "minimal", "strict", or "tight" to better convey the constraint.

Example revision:

-Since OpenWhisk has a very small size limit on code packages, we deploy all functions as Docker images.
+Since OpenWhisk has a strict size limit on code packages, we deploy all functions as Docker images.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b1aa03 and 5ba7dee.

📒 Files selected for processing (2)
  • docs/benchmarks.md (1 hunks)
  • docs/build.md (3 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:15:01.811Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.

Applied to files:

  • docs/benchmarks.md
  • docs/build.md
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • docs/benchmarks.md
  • docs/build.md
📚 Learning: 2025-06-19T16:18:56.877Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.

Applied to files:

  • docs/benchmarks.md
  • docs/build.md
🪛 LanguageTool
docs/benchmarks.md

[locale-violation] ~32-~32: The phrase ‘In future’ is British English. Did you mean: “In the future”?
Context: ...ue to 30 second timeout of API gateway. In future, we might support Docker-based deployme...

(IN_FUTURE)

docs/build.md

[style] ~56-~56: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...taken in OpenWhisk. Since OpenWhisk has a very small size limit on code packages, we deploy ...

(EN_WEAK_ADJECTIVE)

🔇 Additional comments (3)
docs/build.md (2)

23-46: Excellent documentation of the build workflow.

The new "Code Package Build" section provides clear, step-by-step narrative of the build process with concrete examples (e.g., PyTorch, ffmpeg). The decomposition into logical stages (Query Cache, Prepare Environment, etc.) aligns well with the architecture and will help users understand the framework's build pipeline.


48-60: Platform coverage is clearly documented.

The updated Docker Image Build section accurately reflects C++ support limitations (AWS Lambda only) and provides explicit platform compatibility notes. The distinction between AWS Lambda and OpenWhisk workflows is well-articulated.

docs/benchmarks.md (1)

25-26: ARM limitation note is clear and well-placed.

The note correctly documents that ARM architecture is available only for AWS Lambda and that C++ benchmarks are not supported on ARM. This should be consistent with the benchmark table entries above.


| Type | Benchmark | Languages | Architecture | Description |
| :--- | :---: | :---: | :---: | :---: |
| Webapps | 010.sleep | Python, Node.js, C++ | x64, arm64 | Customizable sleep microbenchmark. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n docs/benchmarks.md | head -40

Repository: spcl/serverless-benchmarks

Length of output: 3857


C++ benchmarks incorrectly list ARM architecture support, contradicting the documentation note.

Lines 6, 10, 14, and 16 claim C++ support on both x64 and arm64. However, the note at line 26 explicitly states "C++ benchmarks are currently not supported on the ARM architecture." Update the table to show x64 only for all C++ benchmarks.

Lines 6, 10, 14, 16:

-| Webapps      | 010.sleep    | Python, Node.js, C++ | x64, arm64 | Customizable sleep microbenchmark. |
+| Webapps      | 010.sleep    | Python, Node.js, C++ | x64 | Customizable sleep microbenchmark. |
-| Multimedia      | 210.thumbnailer    | Python, Node.js, C++ | x64, arm64 | Generate a thumbnail of an image. |
+| Multimedia      | 210.thumbnailer    | Python, Node.js, C++ | x64 | Generate a thumbnail of an image. |
-| Scientific      | 501.graph-pagerank    | Python, C++ | x64, arm64 | PageRank implementation with igraph. |
+| Scientific      | 501.graph-pagerank    | Python, C++ | x64 | PageRank implementation with igraph. |
-| Scientific      | 503.graph-bfs    | Python, C++ | x64, arm64 | Breadth-first search (BFS) implementation with igraph. |
+| Scientific      | 503.graph-bfs    | Python, C++ | x64 | Breadth-first search (BFS) implementation with igraph. |

Also fix line 32: change "In future" to "In the future".

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In docs/benchmarks.md around lines 6, 10, 14, and 16, the table rows listing C++
benchmarks currently show "x64, arm64" but the document note states C++ is not
supported on ARM; update each of those C++ benchmark rows to list only "x64" as
the architecture. Also on line 32 change the phrase "In future" to "In the
future".

> [!WARNING]
> [!WARNING]
> Benchmark `411.image-recognition` does not work on AWS with Python 3.9 due to excessive code size. While it is possible to ship the benchmark by zipping `torchvision` and `numpy` (see `benchmarks/400.inference/411.image-recognition/python/package.sh`), this significantly affects cold startup. On the lowest supported memory configuration of 512 MB, the cold startup can reach 30 seconds, making HTTP trigger unusable due to 30 second timeout of API gateway. In future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix British English phrasing.

Line 32 uses "In future" which is British English. Align with American English conventions used elsewhere in the codebase.

-In future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
+In the future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> Benchmark `411.image-recognition` does not work on AWS with Python 3.9 due to excessive code size. While it is possible to ship the benchmark by zipping `torchvision` and `numpy` (see `benchmarks/400.inference/411.image-recognition/python/package.sh`), this significantly affects cold startup. On the lowest supported memory configuration of 512 MB, the cold startup can reach 30 seconds, making HTTP trigger unusable due to 30 second timeout of API gateway. In future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
> Benchmark `411.image-recognition` does not work on AWS with Python 3.9 due to excessive code size. While it is possible to ship the benchmark by zipping `torchvision` and `numpy` (see `benchmarks/400.inference/411.image-recognition/python/package.sh`), this significantly affects cold startup. On the lowest supported memory configuration of 512 MB, the cold startup can reach 30 seconds, making HTTP trigger unusable due to 30 second timeout of API gateway. In the future, we might support Docker-based deployment on AWS that are not affected by code size limitations.
🧰 Tools
🪛 LanguageTool

[locale-violation] ~32-~32: The phrase ‘In future’ is British English. Did you mean: “In the future”?
Context: ...ue to 30 second timeout of API gateway. In future, we might support Docker-based deployme...

(IN_FUTURE)

🤖 Prompt for AI Agents
In docs/benchmarks.md around line 32, the phrase "In future" uses British
English; replace it with American English by changing it to "In the future" so
the wording aligns with the rest of the codebase; update the sentence
accordingly and run a quick grep to ensure no other occurrences remain.

docs/build.md Outdated
An alternative to uploading a code package is to deploy a Docker image. This option is not supported on every platform, i.e., Azure Functions and Google Cloud Functions do not support custom Docker images.

On AWS Lambda, we support deploying functions as Docker images for Python, Node.js, and C++ functions.
We also support building arm64 images for this platform, except for C++ functions (extension to ARM functions is planned in future).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify or remove the speculative future reference.

The phrase "extension to ARM functions is planned in future" is vague and potentially outdated for documentation. Either provide a concrete timeline/status or remove this statement to avoid confusion.

🤖 Prompt for AI Agents
In docs/build.md around line 53, the sentence mentioning "extension to ARM
functions is planned in future" is vague and potentially outdated; either remove
that speculative clause or replace it with a concrete, accurate status (e.g.,
"not currently supported" or "planned for Q3 2025") and, if available, add a
link or short note to an issue/roadmap for tracking. Update the line to state
only the current support status for arm64 images and, if you choose to keep a
future note, provide a specific timeframe or reference to an open issue/roadmap
entry.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/build.md (1)

68-68: Replace vague future-tense statement with current status.

The phrase "extension to ARM functions is planned in the future" is speculative and unhelpful for documentation. State the current limitation clearly instead.

Apply this diff to replace the speculative clause with a concrete statement:

-We support deploying functions as Docker images for Python, Node.js, and C++ functions. We also support building arm64 images for this platform, except for C++ functions (extension to ARM functions is planned in the future).
+We support deploying functions as Docker images for Python, Node.js, and C++ functions. We also support building arm64 images for Python and Node.js; C++ functions are not currently supported on ARM architectures.
🧹 Nitpick comments (1)
docs/build.md (1)

72-72: Strengthen the adjective description.

The intensifier "very" is weak; use a more precise term to convey the constraint.

Apply this diff to improve word choice:

-This platform has a very small size limit on code packages. Thus, we deploy all functions as Docker images.
+This platform has a strict size limit on code packages. Thus, we deploy all functions as Docker images.

Alternatives: "tight" or simply "small" if brevity is preferred. Based on static analysis feedback.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ba7dee and a1464e7.

📒 Files selected for processing (1)
  • docs/build.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
📚 Learning: 2025-06-19T16:19:09.758Z
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.

Applied to files:

  • docs/build.md
🪛 LanguageTool
docs/build.md

[style] ~72-~72: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...ure). ### OpenWhisk This platform has a very small size limit on code packages. Thus, we d...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)
docs/build.md

6-6: Link fragments should be valid

(MD051, link-fragments)

By default, we deploy benchmark code as package uploaded to the serverless platform.
However, on some platforms we use [Docker images](#docker-image-build) instead.
By default, we deploy benchmark code as a package uploaded to the serverless platform. There, we use custom **build images** to install dependencies in an environment resembling the
function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-build) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix invalid link fragment reference.

The link fragment #docker-image-build does not match the actual heading. The section titled "## Docker Image Deployment" (line 39) would generate the anchor #docker-image-deployment.

Apply this diff to correct the link:

-function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-build) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
+function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-deployment) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-build) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
function executor in the cloud. However, on some platforms, we deploy functions as [Docker images](#docker-image-deployment) instead. There, we create one **function image** for each benchmark and configuration (language, language version, architecture).
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

6-6: Link fragments should be valid

(MD051, link-fragments)

🤖 Prompt for AI Agents
In docs/build.md around line 6, the link fragment #[docker-image-build] is
incorrect; replace it with the correct heading anchor #[docker-image-deployment]
so the link points to the "## Docker Image Deployment" section (update the
Markdown link to use #docker-image-deployment).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sebs/faas/system.py (1)

165-181: Fix the docstring to match the updated return type.

The docstring (lines 165-169) states the method returns "Container URI" but the actual return type at line 181 is Tuple[str, int] which only includes path and size. The container URI is no longer part of the return value.

🔎 Proposed fix for the docstring
         Args:
             directory: Path to the code directory
-            language_name: Programming language name
+            language: Programming language
             language_version: Programming language version
             architecture: Target architecture (e.g., 'x64', 'arm64')
             benchmark: Benchmark name
             is_cached: Whether the code is cached
-            container_deployment: Whether to package for container deployment
 
         Returns:
             Tuple containing:
             - Path to packaged code
             - Size of the package
-            - Container URI
♻️ Duplicate comments (3)
benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (3)

17-29: Add error handling for graph generation.

igraph_barabasi_game returns an error code that should be checked. If graph generation fails, continuing with an invalid graph will cause undefined behavior.

uint64_t start_time = timeSinceEpochMicrosec();
int ret = igraph_barabasi_game(
  /* graph=    */ &graph,
  /* n=        */ size,
  /* power=    */ 1,
  /* m=        */ 10,
  /* outseq=   */ NULL,
  /* outpref=  */ 0,
  /* A=        */ 1.0,
  /* directed= */ 0,
  /* algo=     */ IGRAPH_BARABASI_PSUMTREE,
  /* start_from= */ 0
);
if (ret != IGRAPH_SUCCESS) {
  fprintf(stderr, "Graph generation failed with error %d\n", ret);
  return -1.0; // or handle error appropriately
}
graph_generation_time_ms = (timeSinceEpochMicrosec() - start_time);

Note: The boundary check issue for m=10 flagged in previous reviews is still present.


33-42: Add error handling for PageRank computation.

Both igraph_vector_init and igraph_pagerank return error codes that should be checked. Failures could lead to crashes or incorrect results.

int ret = igraph_vector_init(&pagerank, 0);
if (ret != IGRAPH_SUCCESS) {
  fprintf(stderr, "PageRank vector initialization failed\n");
  igraph_destroy(&graph);
  return -1.0;
}

uint64_t start_time = timeSinceEpochMicrosec();
ret = igraph_pagerank(&graph, IGRAPH_PAGERANK_ALGO_PRPACK,
                &pagerank, &value,
                igraph_vss_all(), IGRAPH_DIRECTED,
                /* damping */ 0.85, /* weights */ NULL,
                NULL /* not needed with PRPACK method */);
if (ret != IGRAPH_SUCCESS) {
  fprintf(stderr, "PageRank computation failed\n");
  igraph_vector_destroy(&pagerank);
  igraph_destroy(&graph);
  return -1.0;
}
compute_pr_time_ms = (timeSinceEpochMicrosec() - start_time);

Note: The directedness mismatch flagged in previous reviews (undirected graph with IGRAPH_DIRECTED) is still present.


44-47: Fix type confusion on error return.

Returning integer 1 is implicitly converted to igraph_real_t (typically double), producing 1.0. Since 1.0 is a valid PageRank value, callers cannot distinguish errors from success. This breaks error detection.

Consider using a sentinel value like -1.0 or NaN, or redesign the function to return an error code with the result via an output parameter.

if (fabs(value - 1.0) > 32*DBL_EPSILON) {
    fprintf(stderr, "PageRank failed to converge.\n");
    igraph_vector_destroy(&pagerank);
    igraph_destroy(&graph);
    return -1.0; // Sentinel indicating error
}

Note: The resource leak on early return flagged in previous reviews is still present (though the fix above addresses it).

🧹 Nitpick comments (1)
benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (1)

1-6: Consider adding header guards.

This .hpp file lacks include guards, which could cause issues if included multiple times. Consider adding:

#ifndef BENCHMARKS_GRAPH_PAGERANK_FUNCTION_HPP
#define BENCHMARKS_GRAPH_PAGERANK_FUNCTION_HPP
// ... existing content ...
#endif
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1464e7 and 32d611a.

📒 Files selected for processing (4)
  • README.md
  • benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp
  • benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp
  • sebs/faas/system.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • benchmarks/500.scientific/503.graph-bfs/cpp/main.cpp
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/azure/azure.py:137-0
Timestamp: 2025-06-19T16:19:09.758Z
Learning: C++ support in the serverless-benchmarks project is currently limited to AWS Lambda only. Azure Functions implementation (sebs/azure/azure.py) does not support C++ and only supports Python and Node.js languages.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: sebs/openwhisk/openwhisk.py:119-0
Timestamp: 2025-06-19T16:18:56.877Z
Learning: C++ language support in the serverless benchmarks framework is not available for OpenWhisk platform. C++ support is limited to specific platforms like AWS.
Learnt from: mcopik
Repo: spcl/serverless-benchmarks PR: 99
File: benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp:11-0
Timestamp: 2025-06-19T16:15:01.811Z
Learning: In the serverless-benchmarks project, input validation for benchmark functions is not required since the benchmarks are invoked directly by the maintainers who control the input values. This applies to functions like the sleep benchmark in benchmarks/000.microbenchmarks/010.sleep/cpp/main.cpp.
🧬 Code graph analysis (1)
sebs/faas/system.py (4)
sebs/local/local.py (3)
  • config (41-42)
  • name (29-30)
  • package_code (114-141)
sebs/openwhisk/openwhisk.py (4)
  • config (70-71)
  • finalize_container_build (133-138)
  • name (87-88)
  • package_code (104-131)
sebs/faas/container.py (2)
  • DockerContainer (16-236)
  • name (19-20)
sebs/types.py (1)
  • Language (25-38)
🔇 Additional comments (8)
benchmarks/500.scientific/501.graph-pagerank/cpp/function.hpp (1)

49-54: Proper resource cleanup on success path.

The cleanup sequence correctly destroys resources in the right order. This same pattern should be applied to all error paths.

sebs/faas/system.py (7)

4-4: LGTM!

The import additions support the new container deployment functionality and Language type migration.

Also applies to: 12-12, 17-17


70-72: LGTM!

The container_client property provides a clean extension point for platform-specific container implementations while maintaining backward compatibility with the None default.


184-187: LGTM!

The finalize_container_build extension point allows platforms to customize the container build process while maintaining backward compatibility. The callable signature properly matches package_code.


195-195: LGTM!

Making container_uri optional with str | None appropriately handles both container and non-container deployment scenarios.

Also applies to: 226-226


260-265: LGTM!

The named parameters in the format string improve readability and maintainability.


121-121: LGTM!

The inline formatting of the deployment list improves log readability by keeping related information on a single line.


319-324: No issues found. The Benchmark class defines both container_deployment (property at sebs/benchmark.py:188-189) and container_uri (property at sebs/benchmark.py:155-157) as documented properties. These are properly initialized in the constructor, so the code at lines 319-324 is safe and will not encounter AttributeError at runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants