-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Description:
-
Currently, user-uploaded task files (
.py) are directly saved to theworker/tasksdirectory, posing a significant security risk by allowing arbitrary code execution in the
main worker environment. To mitigate this, we need to implement a secure sandbox environment for validating and testing user-uploaded task files before they are made available
the production worker cluster. -
This issue specifically addresses the "Execution Isolation (Proposed)" point outlined in
CHANGES.md. The goal is to establish a robust and scalable mechanism to test user cod
in isolation, preventing malicious or faulty scripts from impacting the core system or other tasks.
Problem Statement:
Directly executing user-provided code without prior validation and isolation can lead to:
- Security vulnerabilities: Malicious code could access sensitive data, compromise the worker container, or interfere with other services.
- System instability: Faulty code could crash worker processes, consume excessive resources, or create deadlocks, impacting overall system reliability.
- Resource exhaustion: Untested code might have unbounded resource usage (CPU, memory), leading to denial-of-service for other tasks.
Proposed Solution (High-Level):
Implement a sandbox pool, leveraging technologies like gVisor or Firecracker (or a more basic containerization approach for initial implementation), to execute user-uploade
task files in an isolated and controlled environment. This will involve:
1. Staging Uploads: All user-uploaded task files will first land in a designated UPLOAD_STAGING_DIR.
2. Validation Queue: A message will be pushed to a TASK_VALIDATION_QUEUE_NAME triggering a dedicated validation worker.
3. Sandbox Execution: The validation worker will pull files from the staging area, mount them into a sandbox container, and execute them with test payloads under strict
resource limits and monitoring.
4. Security Checks: Monitor for forbidden system calls, excessive resource usage, and ensure the file contains the expected async def handler(payload: dict) entry point.
5. File Promotion/Rejection: If the file passes validation, it will be moved to the worker/tasks directory for use by the main worker cluster. If it fails, it will be
rejected (and potentially deleted or quarantined).
Key Tasks:
- Update
core/config.py: AddUPLOAD_STAGING_DIRandTASK_VALIDATION_QUEUE_NAMEsettings. - Modify
api/routers/tasks.py(/upload_fileendpoint): - Save uploaded files to
UPLOAD_STAGING_DIR. - Enqueue a validation request message to Redis's
TASK_VALIDATION_QUEUE_NAME. - Return an appropriate
202 Acceptedresponse, indicating pending validation. - Implement a new "Validation Worker" service:
- Consume messages from
TASK_VALIDATION_QUEUE_NAME. - Orchestrate sandbox creation/reuse.
- Mount and execute the task file within the sandbox.
- Implement resource limiting (CPU, memory, time) for sandbox execution.
- Implement basic security checks (e.g., restricted syscalls, code linting,
handlerfunction presence). - Report validation status (pass/fail) to a persistent store (e.g., database or Redis key).
- Move validated files to
worker/tasksor delete/quarantine failed files. - Adjust
api/routers/tasks.py(POST /tasks/endpoint): - Before creating a task, verify that the referenced
.pyfile has been successfully validated and moved toworker/tasks. - Return an error if the file is not found or not validated.
- (Optional but Recommended) Implement
GET /tasks/validation_status/{validation_id}endpoint: Allow users to query the validation status of their uploaded files. - (Future Consideration) Integrate with
gVisororFirecracker: For true hardware-level isolation and enhanced security. - Add comprehensive unit and integration tests for the entire validation workflow.
Relevant Documents:
CHANGES.md(for the overall roadmap and context)