term-executor is a remote evaluation executor for the term-challenge platform. It runs as a containerized Rust service on Basilica that receives batch task archives via multipart upload, executes agent code against cloned task repositories, runs validation test scripts, and reports pass/fail results with aggregate rewards. It is the core compute backend that evaluates AI agent coding challenges.
This is a single-crate Rust binary (term-executor) built with Axum. There are no sub-crates or workspaces.
Validator → POST /submit (multipart archive) → term-executor
1. Authenticate via X-Hotkey, X-Nonce, X-Signature headers
2. Verify hotkey is in the dynamic validator whitelist (Bittensor netuid 100, >10k TAO stake)
3. Compute SHA-256 hash of archive bytes
4. Record vote in ConsensusManager
5. If <50% of whitelisted validators have voted for this hash:
→ Return 202 Accepted with pending_consensus status
6. If ≥50% consensus reached:
a. Extract uploaded archive (zip/tar.gz) containing tasks/ and agent_code/
b. Parse each task: workspace.yaml, prompt.md, tests/
c. For each task (concurrently, up to limit):
i. git clone the target repository at base_commit
ii. Run install commands (pip install, etc.)
iii. Write & execute agent code in the repo
iv. Write test source files into the repo
v. Run test scripts (bash), collect exit codes
d. Aggregate results (reward per task, aggregate reward)
e. Stream progress via WebSocket (GET /ws?batch_id=...)
f. Return results via GET /batch/{id}
ValidatorWhitelist refresh loop (every 5 minutes):
1. Connect to Bittensor subtensor via BittensorClient::with_failover()
2. Sync metagraph for netuid 100
3. Filter validators: validator_permit && active && stake >= 10,000 TAO
4. Atomically replace whitelist with new set of SS58 hotkeys
5. On failure: retry up to 3 times with exponential backoff, keep cached whitelist
ConsensusManager reaper loop (every 30 seconds):
1. Remove pending consensus entries older than TTL (default 60s)
| File | Responsibility |
|---|---|
src/main.rs |
Entry point — bootstraps config, session manager, executor, validator whitelist, consensus manager, Axum server, background tasks |
src/config.rs |
Config struct loaded from environment variables with defaults; Bittensor and consensus configuration |
src/handlers.rs |
Axum route handlers: /health, /status, /metrics, /submit, /batch/{id}, /batch/{id}/tasks, /batch/{id}/task/{task_id}, /batches |
src/auth.rs |
Authentication: extract_auth_headers(), verify_request() (whitelist-based), validate_ss58(), sr25519 signature verification via verify_sr25519_signature(), SS58 checksum via blake2, NonceStore for replay protection, AuthHeaders/AuthError types |
src/validator_whitelist.rs |
Dynamic validator whitelist — fetches validators from Bittensor netuid 100 every 5 minutes, filters by stake ≥10k TAO, stores SS58 hotkeys in parking_lot::RwLock<HashSet> |
src/consensus.rs |
50% consensus manager — tracks pending votes per archive hash in DashMap, triggers evaluation when ≥50% of whitelisted validators submit same payload, TTL reaper for expired entries |
src/executor.rs |
Core evaluation engine — spawns batch tasks that clone repos, run agents, run tests concurrently |
src/session.rs |
SessionManager with DashMap, Batch, BatchResult, TaskResult, BatchStatus, TaskStatus, WsEvent types |
src/task.rs |
Archive extraction (zip/tar.gz), task directory parsing, agent code loading, language detection |
src/metrics.rs |
Atomic counter-based Prometheus metrics (batches total/active/completed, tasks passed/failed, duration) |
src/cleanup.rs |
Work directory removal, stale session reaping, process group killing |
src/ws.rs |
WebSocket handler for real-time batch progress streaming |
AppState(inhandlers.rs) holdsConfig,SessionManager,Metrics,Executor,NonceStore,started_at,ValidatorWhitelist,ConsensusManagerSessionManagerusesDashMap<String, Arc<Batch>>for lock-free concurrent accessValidatorWhitelistusesparking_lot::RwLock<HashSet<String>>for concurrent read access with rare writesConsensusManagerusesDashMap<String, PendingConsensus>for lock-free concurrent vote tracking- Per-batch
Semaphoreinexecutor.rscontrols concurrent tasks within a batch (configurable, default: 8) broadcast::Sender<WsEvent>per batch for WebSocket event streaming
- Language: Rust (edition 2021, nightly toolchain for fmt/clippy)
- Async Runtime: Tokio (full features + process),
tokio-stream,futures - Web Framework: Axum 0.7 (json, ws, multipart) with Tower middleware,
tower-http(cors, trace) - HTTP Client: reqwest 0.12 (json, stream) for downloading task archives
- Serialization: serde + serde_json + serde_yaml
- Concurrency:
DashMap6,parking_lot0.12,tokio::sync::Semaphore,tokio::sync::broadcast - Archive Handling:
flate2+tar(tar.gz),zip2 (zip) - Error Handling:
anyhow1 +thiserror2 - Logging:
tracing+tracing-subscriberwith env-filter - Crypto/Identity:
sha2,hex,base64,bs58(SS58 address validation),schnorrkel0.11 (sr25519 signature verification),blake20.10 (SS58 checksum),rand_core0.6,uuidv4 - Blockchain:
bittensor-rs(git dependency) for Bittensor validator whitelisting via subtensor RPC - Time:
chronowith serde support - Build Tooling:
moldlinker via.cargo/config.toml,clangas linker driver - Container: Multi-stage Dockerfile —
rust:1.93-slim-bookwormbuilder →debian:bookworm-slimruntime (includes python3, pip, venv, build-essential, git, curl) - CI: GitHub Actions on
blacksmith-32vcpu-ubuntu-2404runners, nightly Rust
-
Always use
cargo +nightly fmt --allbefore committing. The CI enforces--checkand will reject unformatted code. The project uses the nightly formatter exclusively. -
All clippy warnings are errors. Run
cargo +nightly clippy --all-targets -- -D warningslocally. CI runs the same command and will fail on any warning. -
Never expose secrets in logs or responses. Auth failures log only the rejection, never the submitted hotkey value. Follow this pattern for any new secrets.
-
All process execution MUST have timeouts. Every call to
run_cmd/run_shellinsrc/executor.rstakes aDurationtimeout. Never spawn a child process without a timeout — agent code is untrusted and may hang forever. -
Output MUST be truncated. The
truncate_output()function insrc/executor.rscaps output atMAX_OUTPUT(1MB). Any new command output capture must use this function to prevent memory exhaustion from malicious agent output. -
Shared state must use
Arc+ lock-free structures.SessionManagerusesDashMap(notMutex<HashMap>). Metrics useAtomicU64.ValidatorWhitelistusesparking_lot::RwLock.ConsensusManagerusesDashMap. New shared state should follow these patterns — never usestd::sync::Mutexfor hot-path data. -
Semaphore must gate task concurrency. The per-batch
Semaphoreinexecutor.rslimits concurrent tasks within a batch. TheSessionManager::has_active_batch()check prevents multiple batches from running simultaneously. -
Session cleanup is mandatory. Every task must clean up its work directory in
src/executor.rs. The stale session reaper insrc/cleanup.rsis a safety net, not a primary mechanism. -
Error handling: use
anyhow::Resultfor internal logic,(StatusCode, Json<Value>)for HTTP responses. Handler functions insrc/handlers.rsreturnResult<impl IntoResponse, (StatusCode, Json<Value>)>. Internal executor/task functions returnanyhow::Result<T>. -
All new fields on serialized structs must use
#[serde(default)]orOption<T>. TheWorkspaceConfig,BatchResult, andTaskResultstructs are deserialized from external input or stored results. Missing fields must not break deserialization.
- Write unit tests for all new public functions (see existing
#[cfg(test)]modules in every file) - Use
tracing::info!/warn!/error!for logging (notprintln!) - Add new routes in
src/handlers.rsvia therouter()function - Use
tokio::fsfor async file I/O in the executor pipeline - Use conventional commits (
feat:,fix:,perf:,chore:, etc.)
- Do NOT add
unsafecode — there is none in this project and it should stay that way - Do NOT add synchronous blocking I/O in async functions — use
tokio::task::spawn_blockingfor CPU-heavy work (seeextract_archive_bytesinsrc/task.rs) - Do NOT store large data (agent output, test output) in memory without truncation
- Do NOT add new dependencies without justification — the binary must stay small for container deployment
- Do NOT use
unwrap()in production code paths — use?orcontext()from anyhow.unwrap()is only acceptable in tests and infallible cases (like parsing a known-good string) - Do NOT modify
.cargo/config.toml— it configures the mold linker for fast builds
# Build (debug)
cargo build
# Build (release, matches CI)
cargo +nightly build --release -j $(nproc)
# Run tests
cargo test
# Run tests (release, matches CI)
cargo +nightly test --release -j $(nproc) -- --test-threads=$(nproc)
# Format (required before commit)
cargo +nightly fmt --all
# Format check (what CI runs)
cargo +nightly fmt --all -- --check
# Lint (required before commit)
cargo +nightly clippy --all-targets -- -D warnings
# Run locally
PORT=8080 cargo run
# Docker build
docker build -t term-executor .The .githooks/ directory contains automated quality gates:
- Runs
cargo +nightly fmt --all -- --checkto enforce formatting - Runs
cargo +nightly clippy --all-targets -- -D warningsto enforce lint - Skip with
SKIP_GIT_HOOKS=1 git commit ...
- Runs format check, clippy, full test suite, and release build
- This is the full quality gate matching CI
- Skip with
SKIP_GIT_HOOKS=1 git push ...
Both hooks are activated via git config core.hooksPath .githooks.
| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
HTTP listen port |
SESSION_TTL_SECS |
7200 |
Max batch lifetime before reaping |
MAX_CONCURRENT_TASKS |
8 |
Maximum parallel tasks per batch |
CLONE_TIMEOUT_SECS |
180 |
Git clone timeout |
AGENT_TIMEOUT_SECS |
600 |
Agent execution timeout |
TEST_TIMEOUT_SECS |
300 |
Test suite timeout |
MAX_ARCHIVE_BYTES |
524288000 |
Max uploaded archive size (500MB) |
WORKSPACE_BASE |
/tmp/sessions |
Base directory for session workspaces |
BITTENSOR_NETUID |
100 |
Bittensor subnet ID for validator lookup |
MIN_VALIDATOR_STAKE_TAO |
10000 |
Minimum TAO stake for validator whitelisting |
VALIDATOR_REFRESH_SECS |
300 |
Interval for refreshing validator whitelist (seconds) |
CONSENSUS_THRESHOLD |
0.5 |
Fraction of validators required for consensus (0.0–1.0) |
CONSENSUS_TTL_SECS |
60 |
TTL for pending consensus entries (seconds) |
MAX_PENDING_CONSENSUS |
100 |
Maximum number of pending consensus entries |
Authentication requires three HTTP headers: X-Hotkey (SS58 address), X-Nonce (unique per-request), and X-Signature (sr25519 hex signature of hotkey + nonce). The authorized hotkeys are dynamically loaded from the Bittensor blockchain — all validators on netuid 100 with ≥10,000 TAO stake and an active validator permit are whitelisted. The whitelist refreshes every 5 minutes. Verification steps (in order): hotkey must be in the validator whitelist, SS58 format must be valid, sr25519 signature must verify against the hotkey's public key using the Substrate signing context, and finally the nonce must not have been seen before (replay protection via NonceStore in src/auth.rs with 5-minute TTL — nonce is only consumed after signature passes). Only requests passing all checks can submit batches via POST /submit. Evaluations are only triggered when ≥50% of whitelisted validators have submitted the same archive payload (identified by SHA-256 hash). All other endpoints are open.