Comprehensive guide for running a Basilica validator node that verifies GPU compute resources and maintains network quality on the Bittensor network.
What it does: Validator discovers miners via Bittensor metagraph, SSH directly to their GPU nodes for verification, scores performance, and sets network weights.
Minimum Requirements:
- Linux server: 48+ CPU cores, 140GB RAM, stable internet
- GPU: Nvidia 1xB200 is minimum for validator permit
- Bittensor wallet: Registered on subnet 39 (mainnet) or 387 (testnet) with sufficient stake
- SSH access: For remote node verification (ephemeral keys auto-generated)
Quick Setup (5 steps):
# 1. Ensure Bittensor wallet exists
btcli wallet list
# Should show your validator wallet and hotkey
# 2. Create minimal config
cat > validator.toml <<EOF
[bittensor]
wallet_name = "your_validator_wallet"
hotkey_name = "your_hotkey"
network = "finney"
netuid = 39
axon_port = 9090
external_ip = "your_public_ip"
[database]
url = "sqlite:./data/validator.db"
run_migrations = true
[verification]
verification_interval = { secs = 600, nanos = 0 }
max_concurrent_verifications = 50
netuid = 39
[api]
bind_address = "0.0.0.0:8080"
[ssh_session]
ssh_key_directory = "/tmp/validator_ssh_keys"
[emission]
burn_percentage = 0.0
burn_uid = 204
EOF
# 3. Build and run
./scripts/validator/build.sh
./basilica-validator --config validator.toml start
# 4. Verify operation
# Check logs for "Discovered X miners from metagraph"
# Check logs for "Verification completed" messages
# 5. Monitor via API
curl http://localhost:8080/health
curl http://localhost:8080/minersNeed details? See sections below for architecture explanation, verification workflow, weight setting, and advanced configuration.
- Overview
- Architecture
- Prerequisites
- Understanding the System
- SSH Key Management
- Validator Configuration
- Deployment Methods
- Verification Flow
- Weight Setting and Emissions
- Security & Best Practices
- Monitoring
- Troubleshooting
- Advanced Topics
The Basilica validator performs critical network functions that ensure GPU provider quality and distribute rewards fairly across the Bittensor network.
-
Validator Server: Linux system (no GPU required)
- 8+ CPU cores, 16GB+ RAM recommended
- Stable internet connection with low latency
- Public IP address or proper port forwarding
- SQLite database (PostgreSQL supported)
-
Bittensor Wallet: Registered on subnet
- Mainnet (finney): netuid 39
- Testnet: netuid 387
- Sufficient TAO stake for validator permit
- Hotkey registered on the subnet
-
Network Access:
- Outbound SSH access to miner nodes (port 22)
- Inbound access on axon port (default: 9090)
- Inbound access on API port (default: 8080)
- Miner Discovery: Query Bittensor metagraph to discover all miners on the subnet
- Node Verification: SSH directly to GPU nodes for cryptographic verification
- Performance Scoring: Calculate miner scores based on GPU capabilities and reliability
- Weight Setting: Distribute emissions based on GPU categories and performance
- API Service: Provide external access for rentals and network queries
Unlike traditional verification systems that rely on intermediary agents, Basilica validators use direct SSH access to GPU nodes for verification. This eliminates intermediaries and ensures cryptographic integrity.
┌─────────────────────────────────────────────────────────────┐
│ BITTENSOR NETWORK │
│ (Metagraph Query) │
└────────────────────────┬───────────────────────────────────┘
│
│ 1. Query metagraph for miners
↓
┌────────────────────────┐
│ VALIDATOR │
│ │
│ ┌──────────────────┐ │
│ │ Miner Discovery │ │
│ │ (metagraph) │ │
│ └──────────────────┘ │
│ ┌──────────────────┐ │
│ │ Verification │ │
│ │ Scheduler │ │
│ └──────────────────┘ │
│ ┌──────────────────┐ │
│ │ Weight Setter │ │
│ └──────────────────┘ │
│ ┌──────────────────┐ │
│ │ REST API │ │
│ └──────────────────┘ │
└────────┬───────────────┘
│
│ 2. Authenticate via gRPC
↓
┌────────────────────────┐
│ MINER (gRPC Server) │
│ - Validates signature │
│ - Returns SSH details │
└────────┬───────────────┘
│
│ 3. SSH Key Authorization
↓
┌────────────────────────────────────────────────┐
│ GPU NODES (SSH endpoints) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node N │ │
│ │ GPU: H100│ │ GPU: A100│ │ GPU: ... │ │
│ └─────▲────┘ └─────▲────┘ └─────▲────┘ │
│ │ │ │ │
│ └─────────────┴─────────────┘ │
│ 4. Validator SSHs directly │
│ to execute verification │
└────────────────────────────────────────────────┘
The validator employs a two-tier verification strategy that optimizes for both security and efficiency:
Full Validation (Binary + Hardware Profiling):
- Triggered: New nodes, >6 hours since last validation, or failed lightweight checks
- SSH to node → Upload binaries → Execute verification → Download results
- Validates: GPU attestation, Docker capability, storage, network, hardware specs
- Frequency: Every 6 hours per node
- Score weight: 100% (50% SSH success + 50% binary validation)
Lightweight Validation (SSH Accessibility):
- Triggered: Recently validated nodes (<6 hours)
- Quick SSH connection test
- Updates: Last seen timestamp
- Frequency: Every 10 minutes
- Score weight: Reuses previous validation score
Location: crates/basilica-validator/src/miner_prover/
Purpose: Main orchestrator for miner verification
Sub-components:
MinerDiscovery: Fetches miners from Bittensor metagraphVerificationScheduler: Dual pipeline (full + lightweight) task schedulingVerificationEngine: Executes validation against nodesMinerClient: gRPC communication with miners
Flow:
- Discovery queries metagraph every verification_interval (default: 10 min)
- Scheduler determines which miners need verification
- Engine spawns concurrent verification tasks
- Results stored in database and aggregated for scoring
Location: crates/basilica-validator/src/bittensor_core/weight_setter.rs
Purpose: Distributes emissions based on GPU scoring
Flow:
- Checks current blockchain block every 12 seconds
- Every N blocks (default: 360), triggers weight setting
- Queries GPU scoring engine for miner scores by category
- Allocates weights based on emission configuration
- Applies burn percentage to burn_uid
- Submits weights to Bittensor chain
Location: crates/basilica-validator/src/api/mod.rs
Purpose: REST API for external services
Endpoints:
- Rental management (start, stop, status, logs)
- Node discovery (list available nodes)
- Miner queries (health, nodes, profiles)
- GPU profiles (list by category)
- Verification results
Hardware:
- CPU: 8+ cores recommended (4 minimum)
- RAM: 16GB+ recommended (8GB minimum)
- Storage: 100GB+ SSD (for database and logs)
- Network: Stable connection with <100ms latency to Bittensor chain
Operating System:
- Ubuntu 22.04 LTS (recommended)
- Debian 11+
- Any modern Linux distribution with systemd
# Update system
sudo apt update && sudo apt upgrade -y
# Install build dependencies (if building from source)
sudo apt install -y \
build-essential \
libssl-dev \
pkg-config \
protobuf-compiler \
git \
curl \
sqlite3
# Install Rust (if building from source)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
# Install Docker (optional, for Docker deployment)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER# Install Bittensor CLI
pip install bittensor
# Create validator wallet (if you don't have one)
btcli wallet new_coldkey --wallet.name validator
btcli wallet new_hotkey --wallet.name validator --wallet.hotkey default
# Fund your coldkey with TAO for registration and staking
# Register on subnet
btcli subnet register --netuid 39 --wallet.name validator --wallet.hotkey default
# Check registration
btcli wallet overview --wallet.name validator
# Add stake for validator permit (amount depends on subnet requirements)
btcli stake add --wallet.name validator --wallet.hotkey default --amount 10000Validator Permit Requirements:
- Minimum stake varies by subnet configuration
- Check current validators:
btcli metagraph --netuid 39 - Your stake must be competitive with existing validators
Firewall Rules:
# Allow Bittensor axon port (for other validators/miners)
sudo ufw allow 9090/tcp
# Allow API port (for external services)
sudo ufw allow 8080/tcp
# Allow SSH for administration
sudo ufw allow 22/tcp
# Enable firewall
sudo ufw enablePort Forwarding (if behind NAT):
- Forward external port 9090 → validator server:9090
- Forward external port 8080 → validator server:8080
- Ensure external_ip is set correctly in config
This section explains the deep technical theory of how validation works in Basilica.
How it Works (code: miner_prover/discovery.rs:40-122):
-
Metagraph Query:
- Validator queries Bittensor subtensor for subnet state
- Retrieves all neurons (validators + miners) on the configured netuid
- Metagraph contains: UID, hotkey, stake, endpoint (AxonInfo)
-
Miner Filtering:
// Filters out validators, keeps only miners if neuron.validator_permit { continue; // Skip validators }
-
Endpoint Extraction:
- Parses IP address from u128 format → IPv4/IPv6 string
- Validates IP is not 0.0.0.0 or ::
- Validates port is not 0
- Formats as:
http://{ip}:{port}
-
Result:
Vec<MinerInfo> { uid: u16, hotkey: String, // SS58 AccountId endpoint: String, // http://ip:port stake: u64, // in RAO }
Key Insight: Validators discover ALL miners from metagraph. There's no centralized registry or manual configuration required.
Authentication Flow (code: miner_prover/miner_client.rs:49-150):
Step 1: Validator → Miner Authentication:
ValidatorAuthRequest {
validator_hotkey: "5G3qVa...", // Validator's SS58 hotkey
timestamp: 1704067200, // Current UTC timestamp
signature: "0xabcd...", // Sr25519 signature
ssh_public_key: "ssh-ed25519 AAAA...", // Validator's ephemeral SSH key
}Signature Payload:
let payload = format!(
"BASILICA_AUTH_V1:{}:{}:{}",
nonce,
target_miner_hotkey,
timestamp_secs
);
// Signed with validator's Bittensor keypairStep 2: Miner Validates:
- Verifies signature using validator_hotkey
- Checks timestamp is fresh (within 5 minutes)
- Validates nonce uniqueness (replay attack prevention)
- Deploys SSH public key to all nodes (if provided)
Step 3: Miner → Validator Response:
MinerAuthResponse {
success: true,
message: "Authenticated successfully",
session_token: "uuid-session-token", // 1-hour expiry
}Why Cryptographic Auth?
- Prevents impersonation (must control Bittensor hotkey)
- No passwords or API keys to manage
- Timestamp + nonce prevent replay attacks
- Aligns with Bittensor's identity system
Discovery Protocol (code: miner_prover/miner_client.rs:160-210):
After authentication, validator calls DiscoverNodes RPC:
Request:
DiscoverNodesRequest {
validator_hotkey: "5G3qVa...",
}Response (streaming):
NodeConnectionDetails {
node_id: "550e8400-e29b-41d4-a716-446655440000", // UUID
host: "192.168.1.100",
port: 22,
username: "basilica",
ssh_endpoint: "ssh://192.168.1.100:22",
}Key Details:
- Miner has already deployed validator's SSH public key to these nodes
- node_id is deterministic UUID from
username@host:port - SSH access is immediate (no manual key exchange needed)
Fallback Mechanism:
- If
use_dynamic_discovery = falseor gRPC fails - Falls back to static SSH configuration from database
- Requires manual node configuration (not recommended)
Strategy Selection Logic (code: miner_prover/validation_strategy.rs):
fn determine_strategy(node: &Node, last_validation: Option<Timestamp>) -> Strategy {
match last_validation {
None => Strategy::Full, // Never validated
Some(ts) if now - ts > 6_hours => Strategy::Full, // Too old
Some(ts) if previous_failures > 0 => Strategy::Full, // Had issues
Some(_) => Strategy::Lightweight, // Recently validated successfully
}
}Full Validation Workflow (code: miner_prover/verification.rs:1583-1596):
-
SSH Connection:
ssh -i /tmp/validator_ssh_keys/ephemeral_key.pem basilica@node_ip
-
Binary Upload:
# Upload validator-binary (verification executor) scp validator-binary basilica@node:/tmp/ # Upload executor-binary (for GPU attestation) scp executor-binary basilica@node:/tmp/
-
Remote Execution:
# Execute validation binary /tmp/validator-binary \ --executor-binary /tmp/executor-binary \ --output-format json \ > /tmp/validation_results.json
-
Result Download:
scp basilica@node:/tmp/validation_results.json ./results/
-
Validation Parsing:
{ "gpu_attestation": { "gpus": [ { "uuid": "GPU-550e8400-e29b-41d4-a716-446655440000", "model": "NVIDIA H100 PCIe", "vram_gb": 80, "signature": "0xabcd..." } ] }, "hardware_profile": { "cpu": "...", "ram_gb": 512, "disk_gb": 2000 }, "docker_validation": { "service_active": true, "version": "24.0.7" }, "network_profile": { "download_mbps": 10000, "upload_mbps": 5000 }, "storage_validation": { "available_bytes": 1099511627776 } } -
Score Calculation:
// Full validation score let ssh_score = if ssh_connected { 0.5 } else { 0.0 }; let binary_score = if validation_passed { 0.5 } else { 0.0 }; let total_score = ssh_score + binary_score; // 0.0 - 1.0
-
Database Storage:
- Store GPU UUIDs in
gpu_uuid_assignmentstable - Store hardware profile in
node_hardware_profiletable - Store validation result in
verification_logstable - Update
miner_gpu_profilesfor scoring
- Store GPU UUIDs in
Lightweight Validation Workflow (code: miner_prover/verification.rs:1566-1581):
-
SSH Connection Test:
ssh -i ephemeral_key.pem -o ConnectTimeout=10 basilica@node echo "ok"
-
Update Timestamp:
UPDATE verification_logs SET last_health_check = NOW() WHERE node_id = ?;
-
Score Reuse:
- If SSH succeeds: Reuse previous validation score
- If SSH fails: Set score to 0.0 and trigger full validation next round
Why Two Tiers?
- Security: Full validation every 6 hours ensures integrity
- Efficiency: Lightweight checks every 10 minutes provide fast feedback
- Resource Optimization: Avoid uploading binaries unnecessarily
- Network Health: Quick detection of offline nodes
Concurrency Model (code: miner_prover/scheduler.rs:268-338):
// Spawn concurrent verification tasks
let tasks = miners.iter()
.map(|miner| verify_miner(miner))
.collect::<Vec<_>>();
// Execute with semaphore-like limit
let results = futures::stream::iter(tasks)
.buffer_unordered(config.max_concurrent_verifications) // Default: 50
.collect()
.await;Dual Pipeline Architecture:
- Full Validation Pipeline: Runs independently with its own scheduler
- Lightweight Validation Pipeline: Runs in parallel with full pipeline
- Cleanup Pipeline: Runs every 15 minutes to remove stale tasks
Resource Limits:
max_concurrent_verifications: 50 (lightweight SSH checks)max_concurrent_full_validations: 1024 (binary validation requests)max_miners_per_round: 20 (miners verified per cycle)
Scoring Engine (code: scoring/gpu_scoring_engine.rs):
-
GPU Profile Aggregation:
SELECT miner_uid, gpu_counts_json, total_score FROM miner_gpu_profiles WHERE last_successful_validation > NOW() - INTERVAL '6 hours';
-
GPU Count Extraction:
{ "H100": 8, "A100": 16, "B200": 4 } -
Category Scoring:
// For each GPU category (e.g., "H100") let category_score = verification_score * gpu_count; // Example: 0.95 validation score * 8 GPUs = 7.6 category score
-
Normalization:
// Within each category, normalize scores to 0.0-1.0 let normalized_score = miner_score / category_total_score;
Multi-GPU Miners:
- Miners with multiple GPU categories score in each category
- Each category has independent weight allocation
- Example: Miner with 8x H100 + 16x A100 scores in both categories
Emission Distribution (code: config/emission.rs):
Configuration:
[emission]
burn_percentage = 5.0 # Burn 5% of emissions
burn_uid = 204 # Send burn weight to UID 204
[emission.gpu_allocations]
H100 = { weight = 50.0, min_gpu_count = 4 }
A100 = { weight = 30.0, min_gpu_count = 2 }
B200 = { weight = 20.0, min_gpu_count = 1 }Weight Calculation (code: bittensor_core/weight_setter.rs:48-200):
-
Burn Weight:
let burn_weight = (u16::MAX as f64 * burn_percentage / 100.0) as u16; weights[burn_uid] = burn_weight;
-
Category Allocation:
// Remaining emissions after burn let remaining_emissions = u16::MAX - burn_weight; // Per category let category_allocation = remaining_emissions * (category_weight / 100.0);
-
Miner Weights within Category:
// Proportional to normalized score let miner_weight = category_allocation * normalized_score;
-
Example:
Total emissions: 65535 (u16::MAX) Burn (5%): 3277 → UID 204 Remaining: 62258 H100 allocation (50%): 31129 - Miner 1 (score 0.6): 18677 - Miner 2 (score 0.4): 12452 A100 allocation (30%): 18677 - Miner 3 (score 1.0): 18677 B200 allocation (20%): 12452 - Miner 1 (score 0.5): 6226 - Miner 4 (score 0.5): 6226
Weight Setting Frequency:
- Checks blockchain block every 12 seconds (Bittensor block time)
- Sets weights every N blocks (default: 360 blocks ≈ 72 minutes)
- Configurable via
weight_set_interval_blocks
Validators use ephemeral SSH keys for node verification, enhancing security through short-lived credentials.
Key Manager (code: ssh/key_manager.rs):
// Validator generates ephemeral key pair
let keypair = Ed25519KeyPair::generate();
// Stores in configured directory
ssh_key_directory = "/tmp/validator_ssh_keys"
// Key files
/tmp/validator_ssh_keys/validator_{hotkey}_{timestamp}.pem // Private key
/tmp/validator_ssh_keys/validator_{hotkey}_{timestamp}.pem.pub // Public keyDefault Settings:
- Algorithm: ed25519 (recommended for performance and security)
- Key Lifetime: Determined by miner's authorization TTL (typically 1 hour)
- Storage: Temporary directory (cleaned up after use)
- Cleanup Interval: 60 seconds (removes expired keys)
Step-by-Step (code: miner_prover/verification.rs:440-585):
-
Validator Generates Key:
ssh-keygen -t ed25519 -f /tmp/validator_ssh_keys/ephemeral_key
-
Validator Sends Public Key to Miner:
ValidatorAuthRequest { validator_hotkey: "5G3qVa...", ssh_public_key: "ssh-ed25519 AAAAC3Nza... validator@basilica", ... }
-
Miner Deploys Key to All Nodes:
# On each node, miner adds key to authorized_keys ssh root@node1 'echo "ssh-ed25519 AAAAC3Nza... validator-5G3qVa..." >> ~/.ssh/authorized_keys' ssh root@node2 'echo "ssh-ed25519 AAAAC3Nza... validator-5G3qVa..." >> ~/.ssh/authorized_keys'
-
Validator SSHs to Nodes:
ssh -i /tmp/validator_ssh_keys/ephemeral_key basilica@node1
-
Key Cleanup (after verification):
# Validator removes local private key rm /tmp/validator_ssh_keys/ephemeral_key # Miner removes authorized key from nodes (after session expiry) ssh root@node1 'sed -i "/validator-5G3qVa/d" ~/.ssh/authorized_keys'
For long-term access (e.g., rentals), validators can use persistent keys:
[ssh_session]
persistent_ssh_key_path = "/opt/basilica/keys/validator_persistent.pem"
ssh_key_directory = "/tmp/validator_ssh_keys"Persistent Key Setup:
# Generate persistent key
ssh-keygen -t ed25519 -f /opt/basilica/keys/validator_persistent -C "validator-persistent"
# Set secure permissions
chmod 600 /opt/basilica/keys/validator_persistentUse Cases:
- GPU rentals with extended duration
- Manual node administration
- Debugging and troubleshooting
Ephemeral Keys Advantages:
- ✅ Short-lived credentials (reduced exposure window)
- ✅ Automatic rotation per verification session
- ✅ No long-term key storage on validator
- ✅ Miner controls access duration
SSH Security Settings (code: ssh/session.rs):
SshSessionConfig {
ssh_connection_timeout: Duration::from_secs(30),
ssh_command_timeout: Duration::from_secs(60),
ssh_retry_attempts: 3,
ssh_retry_delay: Duration::from_secs(2),
strict_host_key_checking: false, // Nodes have dynamic IPs
known_hosts_file: None, // Trust miner-provided endpoints
}Audit Logging:
[ssh_session]
enable_audit_logging = true
audit_log_path = "/var/log/basilica/ssh_audit.log"Audit Log Format:
2024-01-01T12:00:00Z validator-5G3qVa connected to node-550e8400 (192.168.1.100:22)
2024-01-01T12:00:15Z validator-5G3qVa executed command on node-550e8400: /tmp/validator-binary
2024-01-01T12:01:30Z validator-5G3qVa disconnected from node-550e8400 (duration: 90s)
Comprehensive breakdown of all configuration options with examples and explanations.
Location: validator.toml
Layered Loading (priority order):
- Environment variables (highest priority)
- TOML configuration file
- Compiled defaults (lowest priority)
Example: Override with environment variables:
# Override database URL
export BASILICA_DATABASE__URL="postgresql://user:pass@localhost/validator"
# Override verification interval
export BASILICA_VERIFICATION__VERIFICATION_INTERVAL__SECS=300
# Run validator
./basilica-validator --config validator.toml start# === Bittensor Network Configuration ===
[bittensor]
# Wallet name (coldkey) - matches ~/.bittensor/wallets/{wallet_name}/
wallet_name = "validator"
# Hotkey name - matches ~/.bittensor/wallets/{wallet_name}/hotkeys/{hotkey_name}
hotkey_name = "default"
# Network selection: "finney" (mainnet), "test" (testnet), or "local"
network = "finney"
# Subnet ID: 39 for mainnet, 387 for testnet
netuid = 39
# Chain endpoint (auto-detected if not specified)
# Mainnet: wss://entrypoint-finney.opentensor.ai:443
# Testnet: wss://test.finney.opentensor.ai:443
# chain_endpoint = "wss://entrypoint-finney.opentensor.ai:443"
# Axon server port (for Bittensor network communication)
axon_port = 9090
# External IP address (required for proper network advertisement)
external_ip = "203.0.113.10"
# Optional: Override advertised axon endpoint
# advertised_axon_endpoint = "http://validator.example.com:9090"
# advertised_axon_tls = false
# === Database Configuration ===
[database]
# Database URL: SQLite or PostgreSQL
# SQLite (default): sqlite:./data/validator.db
# PostgreSQL: postgresql://user:pass@localhost:5432/validator
url = "sqlite:./data/validator.db"
# Connection pool settings
max_connections = 10
min_connections = 1
# Run database migrations on startup
run_migrations = true
# Connection timeout
[database.connect_timeout]
secs = 30
nanos = 0
# Idle connection timeout
[database.idle_timeout]
secs = 600
nanos = 0
# Maximum connection lifetime
[database.max_lifetime]
secs = 3600
nanos = 0
# === HTTP API Server Configuration ===
[server]
# API server bind address
host = "0.0.0.0"
port = 8080
# === Logging Configuration ===
[logging]
# Log level: trace, debug, info, warn, error
level = "info"
# Log format: json, pretty, compact
format = "pretty"
# Optional: Log to file
# file = "/var/log/basilica/validator.log"
# === Metrics Configuration ===
[metrics]
# Enable Prometheus metrics
enabled = true
# Metrics collection interval
[metrics.collection_interval]
secs = 30
nanos = 0
# Prometheus exporter settings
[metrics.prometheus]
host = "127.0.0.1"
port = 9090
path = "/metrics"
# Default labels for all metrics
[metrics.default_labels]
# env = "production"
# region = "us-east"
# Metrics retention period
[metrics.retention_period]
secs = 604800 # 7 days
nanos = 0
# === Verification Configuration ===
[verification]
# How often to run verification rounds
[verification.verification_interval]
secs = 600 # 10 minutes
nanos = 0
# Maximum concurrent lightweight verifications (SSH checks)
max_concurrent_verifications = 50
# Maximum concurrent full validations (binary executions)
max_concurrent_full_validations = 1024
# Timeout for individual verification challenges
[verification.challenge_timeout]
secs = 120
nanos = 0
# Minimum score threshold for miners (0.0 - 1.0)
min_score_threshold = 0.1
# Maximum miners to verify per round
max_miners_per_round = 20
# Minimum interval between verifying the same miner
[verification.min_verification_interval]
secs = 1800 # 30 minutes
nanos = 0
# Subnet ID (should match bittensor.netuid)
netuid = 39
# Use dynamic SSH endpoint discovery from miners
use_dynamic_discovery = true
# Timeout for miner discovery operations
[verification.discovery_timeout]
secs = 30
nanos = 0
# Fall back to static SSH config if dynamic discovery fails
fallback_to_static = true
# Cache miner endpoint info TTL
[verification.cache_miner_info_ttl]
secs = 300 # 5 minutes
nanos = 0
# gRPC port offset from miner's axon port (default: uses port 8080)
# grpc_port_offset = 1000 # Would use axon_port + 1000
# Collateral event scan interval (blockchain monitoring)
[verification.collateral_event_scan_interval]
secs = 12 # 1 Bittensor block
nanos = 0
# Interval between full binary validations per node
[verification.node_validation_interval]
secs = 21600 # 6 hours
nanos = 0
# Time period for cleaning up GPU assignments from offline nodes
[verification.gpu_assignment_cleanup_ttl]
secs = 7200 # 2 hours
nanos = 0
# Enable worker queue for decoupled validation execution
enable_worker_queue = false
# Binary validation settings
[verification.binary_validation]
# Path to validator-binary executable (excluded from docs per request)
validator_binary_path = "./validator-binary"
# Path to executor-binary for upload (excluded from docs per request)
executor_binary_path = "./executor-binary"
# Binary execution timeout
execution_timeout_secs = 1200 # 20 minutes
# Output format
output_format = "json"
# Enable binary validation
enabled = true
# Binary validation score weight
score_weight = 0.8
# Default node port for SSH tunnel cleanup
node_port = 3000
# Validation server mode configuration
[verification.binary_validation.server_mode]
bind_address = "127.0.0.1:4010"
remote_concurrency = 1024
verify_concurrency = 1
queue_capacity = 4096
health_check_interval_secs = 30
job_poll_interval_ms = 500
max_poll_attempts = 2400
server_ready_timeout_secs = 30
server_ready_check_interval_ms = 500
# Docker validation settings
[verification.docker_validation]
docker_image = "nvidia/cuda:12.8.0-runtime-ubuntu22.04"
pull_timeout_secs = 1800 # 30 minutes
# Storage validation settings
[verification.storage_validation]
min_required_storage_bytes = 1099511627776 # 1TB
# === Automatic Verification Configuration ===
[automatic_verification]
# Enable automatic verification during discovery
enabled = true
# Discovery verification interval in seconds
discovery_interval = 300 # 5 minutes
# Minimum time between verifications for same miner
min_verification_interval_hours = 1
# Maximum concurrent verifications
max_concurrent_verifications = 50
# Enable SSH session automation
enable_ssh_automation = true
# === Storage Configuration ===
[storage]
# Data directory for validator storage
data_dir = "./data"
# === API Configuration ===
[api]
# API server bind address (external services)
bind_address = "0.0.0.0:8080"
# Maximum request body size (bytes)
max_body_size = 1048576 # 1MB
# Optional: API key for authentication
# api_key = "your-secret-api-key"
# Default miner port for connections
miner_port = 8091
# === SSH Session Configuration ===
[ssh_session]
# Directory for ephemeral SSH keys
ssh_key_directory = "/tmp/validator_ssh_keys"
# SSH key algorithm: "ed25519" or "rsa"
key_algorithm = "ed25519"
# Optional: Persistent SSH private key path
# persistent_ssh_key_path = "/opt/basilica/keys/validator_persistent.pem"
# Default session duration (seconds)
default_session_duration = 300 # 5 minutes
# Maximum session duration (seconds)
max_session_duration = 3600 # 1 hour
# Rental session duration (0 = no predetermined duration)
rental_session_duration = 0
# Key cleanup interval
[ssh_session.key_cleanup_interval]
secs = 60
nanos = 0
# Enable automated SSH session management
enable_automated_sessions = true
# Maximum concurrent SSH sessions
max_concurrent_sessions = 5
# Session rate limit per hour
session_rate_limit = 20
# Enable SSH audit logging
enable_audit_logging = true
# Audit log file path
audit_log_path = "/var/log/basilica/ssh_audit.log"
# SSH connection timeout
[ssh_session.ssh_connection_timeout]
secs = 30
nanos = 0
# SSH command execution timeout
[ssh_session.ssh_command_timeout]
secs = 60
nanos = 0
# SSH retry attempts on connection failure
ssh_retry_attempts = 3
# Delay between retry attempts
[ssh_session.ssh_retry_delay]
secs = 2
nanos = 0
# Strict host key checking (false for dynamic node IPs)
strict_host_key_checking = false
# Known hosts file path (None = don't check)
# known_hosts_file = "/home/validator/.ssh/known_hosts"
# === Emission Configuration ===
[emission]
# Percentage of emissions to burn (0.0 - 100.0)
burn_percentage = 0.0
# UID to send burn weights to
burn_uid = 204
# Minimum miners required per GPU category to enable incentives
min_miners_per_category = 1
# Blocks between weight setting operations
weight_set_interval_blocks = 360
# Weight version key (for protocol upgrades)
weight_version_key = 0
# GPU model allocations with weights and minimum requirements
[emission.gpu_allocations]
H100 = { weight = 40.0, min_gpu_count = 4, min_gpu_vram = 80 }
A100 = { weight = 30.0, min_gpu_count = 2, min_gpu_vram = 40 }
B200 = { weight = 20.0, min_gpu_count = 1, min_gpu_vram = 192 }
H200 = { weight = 10.0, min_gpu_count = 1, min_gpu_vram = 141 }
# === Database Cleanup Configuration ===
[cleanup]
# Enable automatic database cleanup
enabled = true
# Cleanup interval
[cleanup.cleanup_interval]
secs = 3600 # 1 hour
nanos = 0
# Retention periods for different data types
[cleanup.verification_logs_retention]
secs = 2592000 # 30 days
nanos = 0
[cleanup.emission_metrics_retention]
secs = 7776000 # 90 days
nanos = 0
[cleanup.rental_logs_retention]
secs = 2592000 # 30 days
nanos = 0Validate Before Starting:
# Validate configuration file
./basilica-validator --config validator.toml config validate
# Example output:
# ✓ Configuration validation passed
# ✓ Database connection successful
# ✓ Bittensor wallet found: validator/default
# ✓ Network connectivity confirmed
# ! Warning: external_ip not set (auto-detection will be used)
# ! Warning: GPU allocations total weight is 100.0% (recommended)Common Validation Errors:
-
Invalid wallet path:
Error: Wallet not found at ~/.bittensor/wallets/validator/hotkeys/default Solution: Check wallet_name and hotkey_name match your Bittensor wallet -
Database connection failed:
Error: Failed to connect to database: Connection refused Solution: Ensure database is running and URL is correct -
Invalid GPU allocations:
Error: GPU allocation weights must sum to 100.0 (current: 95.0) Solution: Adjust gpu_allocations weights to total 100.0 -
Network unreachable:
Error: Cannot reach Bittensor chain endpoint Solution: Check internet connectivity and chain_endpoint URL
Four deployment methods for different use cases: Binary, Systemd, Docker, and Docker Compose.
Best for: Development, testing, manual control
Step 1: Build the Validator
# Clone repository
git clone https://github.com/your-org/basilica.git
cd basilica/basilica
# Build using the build script
./scripts/validator/build.sh
# Verify build
ls -lh basilica-validator
# Should show ~50MB binaryStep 2: Prepare Configuration
# Copy example config
cp config/validator.correct.toml config/validator.toml
# Edit configuration
nano config/validator.toml
# Set required fields:
# - bittensor.wallet_name
# - bittensor.hotkey_name
# - bittensor.external_ipStep 3: Create Data Directories
# Create directories
mkdir -p data logs /tmp/validator_ssh_keys
# Set permissions
chmod 700 /tmp/validator_ssh_keysStep 4: Run Validator
# Run in foreground (for testing)
./basilica-validator --config config/validator.toml start
# Run in background with nohup
nohup ./basilica-validator --config config/validator.toml start > logs/validator.log 2>&1 &
# Check process
ps aux | grep basilica-validator
# View logs
tail -f logs/validator.logStep 5: Verify Operation
# Check health endpoint
curl http://localhost:8080/health
# Check miner discovery
curl http://localhost:8080/miners | jq
# Check metrics
curl http://localhost:9090/metricsBest for: Production, auto-restart, system integration
Step 1: Build and Install
# Build validator
./scripts/validator/build.sh
# Create installation directory
sudo mkdir -p /opt/basilica/{bin,config,data,logs}
# Copy binary
sudo cp basilica-validator /opt/basilica/bin/
# Copy configuration
sudo cp config/validator.toml /opt/basilica/config/
# Set ownership
sudo chown -R $USER:$USER /opt/basilicaStep 2: Create Systemd Service File
Create /etc/systemd/system/basilica-validator.service:
[Unit]
Description=Basilica Validator
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=root
WorkingDirectory=/opt/basilica
ExecStart=/opt/basilica/bin/basilica-validator --config /opt/basilica/config/validator.toml start
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=basilica-validator
# Security settings
NoNewPrivileges=yes
PrivateTmp=yes
ProtectSystem=strict
ReadWritePaths=/opt/basilica/data /opt/basilica/logs /tmp/validator_ssh_keys
ProtectHome=yes
# Resource limits
LimitNOFILE=65536
LimitNPROC=4096
[Install]
WantedBy=multi-user.targetStep 3: Enable and Start Service
# Reload systemd
sudo systemctl daemon-reload
# Enable service (start on boot)
sudo systemctl enable basilica-validator
# Start service
sudo systemctl start basilica-validator
# Check status
sudo systemctl status basilica-validator
# View logs
sudo journalctl -u basilica-validator -f
# View recent logs
sudo journalctl -u basilica-validator -n 100Step 4: Service Management
# Stop service
sudo systemctl stop basilica-validator
# Restart service
sudo systemctl restart basilica-validator
# Disable service (don't start on boot)
sudo systemctl disable basilica-validator
# View service configuration
sudo systemctl cat basilica-validatorBest for: Containerized environments, easy updates, isolation
Step 1: Build Docker Image
# Build image using build script
cd scripts/validator
./build.sh --docker
# Or build manually
docker build -t basilica-validator:latest -f Dockerfile ../..
# Verify image
docker images | grep basilica-validatorStep 2: Prepare Configuration and Volumes
# Create host directories
mkdir -p /opt/basilica/{config,data,logs,wallets,ssh_keys}
# Copy configuration
cp ../../config/validator.toml /opt/basilica/config/
# Copy Bittensor wallet (or mount existing)
cp -r ~/.bittensor/wallets /opt/basilica/
# Set permissions
chmod 700 /opt/basilica/ssh_keys
chmod 700 /opt/basilica/walletsStep 3: Run Container
# Run with Docker
docker run -d \
--name basilica-validator \
--restart unless-stopped \
-p 9090:9090 \
-p 8080:8080 \
-p 9090:9090 \
-v /opt/basilica/config:/opt/basilica/config:ro \
-v /opt/basilica/data:/opt/basilica/data \
-v /opt/basilica/logs:/opt/basilica/logs \
-v /opt/basilica/wallets:/root/.bittensor/wallets:ro \
-v /opt/basilica/ssh_keys:/tmp/validator_ssh_keys \
basilica-validator:latest \
--config /opt/basilica/config/validator.toml start
# Check container status
docker ps | grep basilica-validator
# View logs
docker logs -f basilica-validator
# View recent logs
docker logs --tail 100 basilica-validatorStep 4: Container Management
# Stop container
docker stop basilica-validator
# Start container
docker start basilica-validator
# Restart container
docker restart basilica-validator
# Remove container
docker rm -f basilica-validator
# Update to new version
docker pull basilica-validator:latest
docker stop basilica-validator
docker rm basilica-validator
# Re-run docker run command from Step 3Best for: Production, monitoring stack, easy management
Step 1: Prepare Compose File
Location: scripts/validator/compose.prod.yml
version: '3.8'
services:
validator:
image: basilica-validator:latest
container_name: basilica-validator
restart: unless-stopped
command: --config /opt/basilica/config/validator.toml start
ports:
- "9090:9090" # Bittensor axon
- "8080:8080" # API server
- "9090:9090" # Metrics
volumes:
- /opt/basilica/config:/opt/basilica/config:ro
- /opt/basilica/data:/opt/basilica/data
- /opt/basilica/logs:/opt/basilica/logs
- ~/.bittensor/wallets:/root/.bittensor/wallets:ro
- validator_ssh_keys:/tmp/validator_ssh_keys
environment:
- RUST_LOG=info
- RUST_BACKTRACE=1
networks:
- basilica_network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
prometheus:
image: prom/prometheus:latest
container_name: basilica-prometheus
restart: unless-stopped
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
ports:
- "9091:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
networks:
- basilica_network
grafana:
image: grafana/grafana:latest
container_name: basilica-grafana
restart: unless-stopped
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards:ro
- ./grafana/datasources:/etc/grafana/provisioning/datasources:ro
networks:
- basilica_network
depends_on:
- prometheus
volumes:
validator_ssh_keys:
prometheus_data:
grafana_data:
networks:
basilica_network:
driver: bridgeStep 2: Create Prometheus Configuration
Location: scripts/validator/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'basilica-validator'
static_configs:
- targets: ['validator:9090']
labels:
service: 'validator'Step 3: Deploy Stack
# Navigate to validator scripts
cd scripts/validator
# Ensure configuration exists
ls /opt/basilica/config/validator.toml
# Deploy with Docker Compose
docker compose -f compose.prod.yml up -d
# Check all services
docker compose -f compose.prod.yml ps
# View logs
docker compose -f compose.prod.yml logs -f validator
# View specific service logs
docker compose -f compose.prod.yml logs -f prometheus
docker compose -f compose.prod.yml logs -f grafanaStep 4: Access Services
# Validator API
curl http://localhost:8080/health
# Prometheus
open http://localhost:9091
# Grafana
open http://localhost:3000
# Login: admin / adminStep 5: Stack Management
# Stop all services
docker compose -f compose.prod.yml down
# Stop but keep volumes
docker compose -f compose.prod.yml stop
# Start services
docker compose -f compose.prod.yml start
# Restart specific service
docker compose -f compose.prod.yml restart validator
# View resource usage
docker compose -f compose.prod.yml stats
# Remove everything including volumes
docker compose -f compose.prod.yml down -vUse the provided deployment script for remote deployment:
# Deploy to remote server with systemd
./scripts/validator/deploy.sh \
--server user@validator.example.com:22 \
--mode systemd \
--sync-wallets \
--health-check \
--follow-logs
# Deploy with Docker
./scripts/validator/deploy.sh \
--server user@validator.example.com:22 \
--mode docker \
--sync-wallets
# Deploy with Docker Compose (recommended)
./scripts/validator/deploy.sh \
--server user@validator.example.com:22 \
--mode docker-compose \
--sync-wallets \
--health-checkScript Features (code: scripts/validator/deploy.sh):
- Builds validator locally
- Uploads binary and configuration to remote server
- Optionally syncs Bittensor wallets
- Installs and starts service
- Performs health checks
- Can follow logs after deployment
Complete walkthrough of how validators verify miners and their GPU nodes.
Trigger: Every verification_interval (default: 10 minutes)
Code Flow (miner_prover/discovery.rs:40-122):
// Query Bittensor metagraph
let neurons = subtensor.get_neurons(netuid).await?;
// Filter for miners (exclude validators)
let miners = neurons.iter()
.filter(|n| !n.validator_permit)
.map(|n| MinerInfo {
uid: n.uid,
hotkey: n.hotkey.to_ss58(),
endpoint: format!("http://{}:{}", n.axon_info.ip, n.axon_info.port),
stake: n.total_stake,
})
.collect();Result: List of all miners on the subnet with their endpoints
Example Discovery Log:
2024-01-01T12:00:00Z INFO Discovered 47 miners from metagraph
2024-01-01T12:00:00Z DEBUG Miner UID 5: 5G3qVa... at http://203.0.113.10:8080 (stake: 12500 TAO)
2024-01-01T12:00:00Z DEBUG Miner UID 12: 5HGjWa... at http://198.51.100.20:8080 (stake: 8300 TAO)
...
Trigger: For each miner selected for verification
Code Flow (miner_prover/miner_client.rs:124-150):
// Generate ephemeral SSH key
let ssh_keypair = Ed25519KeyPair::generate();
let ssh_public_key = ssh_keypair.public_key_openssh();
// Create authentication request
let auth_request = ValidatorAuthRequest {
validator_hotkey: config.bittensor.hotkey.to_string(),
timestamp: Utc::now().timestamp(),
signature: sign_payload(&payload, &bittensor_keypair),
ssh_public_key: Some(ssh_public_key),
nonce: generate_nonce(),
};
// Send to miner's gRPC endpoint
let response = miner_client.authenticate_validator(auth_request).await?;Authentication Payload:
BASILICA_AUTH_V1:123456:5FHneW...:1704067200
Example Auth Log:
2024-01-01T12:00:05Z INFO Authenticating with miner UID 5 (5G3qVa...)
2024-01-01T12:00:05Z DEBUG Generated ephemeral SSH key: ssh-ed25519 AAAAC3Nza...
2024-01-01T12:00:06Z INFO Authentication successful, session token: 550e8400-e29b-41d4-a716-446655440000
Trigger: Immediately after successful authentication
Code Flow (miner_prover/miner_client.rs:160-210):
// Request node details from miner
let discover_request = DiscoverNodesRequest {
validator_hotkey: config.bittensor.hotkey.to_string(),
};
// Receive streaming response
let mut node_stream = miner_client.discover_nodes(discover_request).await?;
while let Some(node_details) = node_stream.message().await? {
nodes.push(NodeConnectionDetails {
node_id: node_details.node_id,
host: node_details.host,
port: node_details.port,
username: node_details.username,
ssh_endpoint: node_details.ssh_endpoint,
});
}Example Discovery Response:
{
"node_id": "550e8400-e29b-41d4-a716-446655440000",
"host": "192.168.1.100",
"port": 22,
"username": "basilica",
"ssh_endpoint": "ssh://192.168.1.100:22"
}Example Discovery Log:
2024-01-01T12:00:07Z INFO Discovered 3 nodes from miner UID 5
2024-01-01T12:00:07Z DEBUG Node 550e8400: ssh://192.168.1.100:22 (basilica@192.168.1.100)
2024-01-01T12:00:07Z DEBUG Node 660f9511: ssh://192.168.1.101:22 (basilica@192.168.1.101)
2024-01-01T12:00:07Z DEBUG Node 770fa622: ssh://192.168.1.102:22 (basilica@192.168.1.102)
Trigger: For each node discovered
Code Flow (miner_prover/validation_strategy.rs:10-50):
// Check last validation time
let last_validation = db.get_last_validation(node_id).await?;
let strategy = match last_validation {
None => {
// Never validated
ValidationStrategy::Full
}
Some(validation) if now - validation.timestamp > Duration::from_secs(6 * 3600) => {
// More than 6 hours old
ValidationStrategy::Full
}
Some(validation) if validation.failures > 0 => {
// Previous failures
ValidationStrategy::Full
}
Some(_) => {
// Recently validated successfully
ValidationStrategy::Lightweight
}
};Example Strategy Log:
2024-01-01T12:00:08Z DEBUG Node 550e8400: Last validated 2 hours ago → Lightweight
2024-01-01T12:00:08Z DEBUG Node 660f9511: Never validated → Full
2024-01-01T12:00:08Z DEBUG Node 770fa622: Last validated 8 hours ago → Full
Trigger: Node requires full validation
Detailed Flow (miner_prover/verification.rs:1583-1596):
5a.1: SSH Connection:
ssh -i /tmp/validator_ssh_keys/ephemeral_550e8400.pem \
-o ConnectTimeout=30 \
-o StrictHostKeyChecking=no \
basilica@192.168.1.1005a.2: Binary Upload:
# Note: Binary upload/execution excluded from docs per user request
# Validator uploads verification binaries to node
# Executes GPU attestation and hardware profiling
# Downloads JSON results5a.3: Result Parsing:
{
"gpu_attestation": {
"gpus": [
{
"uuid": "GPU-550e8400-e29b-41d4-a716-446655440000",
"model": "NVIDIA H100 PCIe",
"vram_gb": 80,
"cuda_version": "12.8",
"driver_version": "550.54.15",
"compute_capability": "9.0"
}
],
"validation_passed": true
},
"hardware_profile": {
"cpu_model": "AMD EPYC 9654",
"cpu_cores": 96,
"ram_gb": 512,
"disk_gb": 7680
},
"docker_validation": {
"service_active": true,
"docker_version": "24.0.7",
"nvidia_runtime": true,
"images_pulled": 1
},
"network_profile": {
"download_mbps": 10000,
"upload_mbps": 5000,
"latency_ms": 15
},
"storage_validation": {
"total_bytes": 8246337208320,
"available_bytes": 6597069766656,
"meets_requirement": true
}
}5a.4: Score Calculation:
let ssh_score = if ssh_connected { 0.5 } else { 0.0 };
let binary_score = if validation_passed { 0.5 } else { 0.0 };
let total_score = ssh_score + binary_score;5a.5: Database Storage:
-- Store GPU UUID assignments
INSERT INTO gpu_uuid_assignments (gpu_uuid, node_id, miner_id, gpu_name, last_verified)
VALUES ('GPU-550e8400...', '550e8400...', 5, 'NVIDIA H100 PCIe', NOW());
-- Store hardware profile
INSERT INTO node_hardware_profile (miner_uid, node_id, cpu_model, cpu_cores, ram_gb, disk_gb)
VALUES (5, '550e8400...', 'AMD EPYC 9654', 96, 512, 7680);
-- Store verification result
INSERT INTO verification_logs (node_id, verification_type, score, success, details)
VALUES ('550e8400...', 'full', 1.0, 1, '{"gpu_count": 1, "model": "H100"}');
-- Update GPU profile for scoring
INSERT INTO miner_gpu_profiles (miner_uid, gpu_counts_json, total_score)
VALUES (5, '{"H100": 1}', 1.0)
ON CONFLICT (miner_uid) DO UPDATE
SET gpu_counts_json = '{"H100": 1}', total_score = 1.0, last_updated = NOW();Example Full Validation Log:
2024-01-01T12:00:10Z INFO [Full] Validating node 660f9511 (miner UID 5)
2024-01-01T12:00:11Z DEBUG [Full] SSH connected to 192.168.1.101:22
2024-01-01T12:00:12Z DEBUG [Full] Binary upload complete (15.2 MB in 1.2s)
2024-01-01T12:00:45Z DEBUG [Full] Binary execution complete (33.1s)
2024-01-01T12:00:46Z DEBUG [Full] Results downloaded (142 KB)
2024-01-01T12:00:46Z INFO [Full] Validation passed: 1 GPU (H100), Docker: ✓, Storage: ✓
2024-01-01T12:00:47Z INFO [Full] Node 660f9511 score: 1.00 (SSH: 0.50 + Binary: 0.50)
Trigger: Node recently validated (<6 hours, no failures)
Detailed Flow (miner_prover/verification.rs:1566-1581):
5b.1: SSH Connection Test:
ssh -i /tmp/validator_ssh_keys/ephemeral_550e8400.pem \
-o ConnectTimeout=10 \
basilica@192.168.1.100 \
echo "ok"5b.2: Timestamp Update:
UPDATE verification_logs
SET last_health_check = NOW()
WHERE node_id = '550e8400...';5b.3: Score Reuse:
// Reuse previous validation score if SSH succeeds
let score = if ssh_connected {
previous_validation.score
} else {
0.0 // SSH failed, mark as down
};Example Lightweight Validation Log:
2024-01-01T12:00:10Z INFO [Lightweight] Checking node 550e8400 (miner UID 5)
2024-01-01T12:00:11Z DEBUG [Lightweight] SSH connected to 192.168.1.100:22
2024-01-01T12:00:11Z INFO [Lightweight] Node 550e8400 is accessible, reusing score: 1.00
Trigger: After all nodes verified for a miner
Code Flow (miner_prover/verification.rs:280-313):
// Aggregate scores across all nodes
let total_score = node_scores.iter().sum::<f64>();
let average_score = total_score / node_scores.len() as f64;
// Update miner's overall score
db.update_miner_score(miner_uid, average_score).await?;
// Log result
info!(
"Miner UID {}: {} nodes verified, average score: {:.2}",
miner_uid,
node_scores.len(),
average_score
);Example Aggregation Log:
2024-01-01T12:01:00Z INFO Verification round complete for miner UID 5
2024-01-01T12:01:00Z INFO Nodes verified: 3 (2 full, 1 lightweight)
2024-01-01T12:01:00Z INFO Node scores: [1.00, 1.00, 1.00]
2024-01-01T12:01:00Z INFO Miner UID 5 average score: 1.00
2024-01-01T12:01:00Z INFO GPU profile: H100=3
Final Database State:
-- Miner GPU profile for weight setting
SELECT * FROM miner_gpu_profiles WHERE miner_uid = 5;
-- miner_uid | gpu_counts_json | total_score | last_updated
-- 5 | {"H100": 3} | 1.00 | 2024-01-01 12:01:00
-- Individual node verification results
SELECT * FROM verification_logs WHERE node_id IN (SELECT id FROM miner_nodes WHERE miner_id = 5);
-- node_id | verification_type | score | success | timestamp
-- 550e8400 | lightweight | 1.00 | 1 | 2024-01-01 12:00:11
-- 660f9511 | full | 1.00 | 1 | 2024-01-01 12:00:47
-- 770fa622 | full | 1.00 | 1 | 2024-01-01 12:00:55Concurrency Management (miner_prover/scheduler.rs:268-338):
// Create verification tasks for all selected miners
let tasks: Vec<_> = miners.iter()
.map(|miner| {
let engine = verification_engine.clone();
async move {
engine.verify_miner(miner).await
}
})
.collect();
// Execute with concurrency limit
let results = futures::stream::iter(tasks)
.buffer_unordered(config.max_concurrent_verifications)
.collect::<Vec<_>>()
.await;Resource Limits:
- Lightweight verifications: Up to 50 concurrent
- Full validations: Up to 1024 concurrent
- Miners per round: Up to 20
Example Parallel Execution Log:
2024-01-01T12:00:00Z INFO Starting verification round: 20 miners selected
2024-01-01T12:00:00Z DEBUG Spawning 20 concurrent verification tasks
2024-01-01T12:00:00Z DEBUG Active verifications: 20 (lightweight: 15, full: 5)
2024-01-01T12:00:15Z DEBUG Completed: 8 verifications (12 remaining)
2024-01-01T12:00:30Z DEBUG Completed: 16 verifications (4 remaining)
2024-01-01T12:00:45Z DEBUG Completed: 20 verifications (0 remaining)
2024-01-01T12:00:45Z INFO Verification round complete: 20/20 successful (100%)
How validators distribute TAO emissions based on GPU performance and categories.
Goal: Fairly distribute subnet emissions across miners based on:
- GPU category (H100, A100, B200, etc.)
- GPU quantity per category
- Verification score (performance and reliability)
Configuration-Driven Allocation:
[emission]
burn_percentage = 5.0
burn_uid = 204
[emission.gpu_allocations]
H100 = { weight = 50.0, min_gpu_count = 4 }
A100 = { weight = 30.0, min_gpu_count = 2 }
B200 = { weight = 20.0, min_gpu_count = 1 }Query Miner Profiles (scoring/gpu_scoring_engine.rs):
SELECT
miner_uid,
gpu_counts_json,
total_score,
verification_count
FROM miner_gpu_profiles
WHERE last_successful_validation > NOW() - INTERVAL '6 hours'
AND total_score >= 0.1; -- min_score_thresholdExample Results:
miner_uid | gpu_counts_json | total_score
5 | {"H100": 8} | 0.95
12 | {"H100": 4, "A100": 16} | 0.92
23 | {"A100": 8} | 0.88
45 | {"B200": 2} | 0.85
For each GPU category, calculate scores:
H100 Category:
// Miner 5: 8x H100, score 0.95
let miner_5_h100_score = 0.95 * 8 = 7.6
// Miner 12: 4x H100, score 0.92
let miner_12_h100_score = 0.92 * 4 = 3.68
// Category total
let h100_total_score = 7.6 + 3.68 = 11.28A100 Category:
// Miner 12: 16x A100, score 0.92
let miner_12_a100_score = 0.92 * 16 = 14.72
// Miner 23: 8x A100, score 0.88
let miner_23_a100_score = 0.88 * 8 = 7.04
// Category total
let a100_total_score = 14.72 + 7.04 = 21.76B200 Category:
// Miner 45: 2x B200, score 0.85
let miner_45_b200_score = 0.85 * 2 = 1.7
// Category total
let b200_total_score = 1.7Burn Allocation (bittensor_core/weight_setter.rs:48-200):
// Total weight available
let total_weight = u16::MAX; // 65535
// Burn weight (5% of total)
let burn_weight = (total_weight as f64 * 0.05) as u16; // 3277
// Assign to burn UID
weights[burn_uid] = burn_weight;
// Remaining for miners
let remaining_weight = total_weight - burn_weight; // 62258Distribute remaining weight across GPU categories:
// H100 allocation (50% of remaining)
let h100_allocation = (remaining_weight as f64 * 0.50) as u16; // 31129
// A100 allocation (30% of remaining)
let a100_allocation = (remaining_weight as f64 * 0.30) as u16; // 18677
// B200 allocation (20% of remaining)
let b200_allocation = (remaining_weight as f64 * 0.20) as u16; // 12452H100 Category Distribution:
// Miner 5: 7.6 / 11.28 = 0.674
let miner_5_h100_weight = (h100_allocation as f64 * 0.674) as u16; // 20987
// Miner 12: 3.68 / 11.28 = 0.326
let miner_12_h100_weight = (h100_allocation as f64 * 0.326) as u16; // 10148A100 Category Distribution:
// Miner 12: 14.72 / 21.76 = 0.677
let miner_12_a100_weight = (a100_allocation as f64 * 0.677) as u16; // 12644
// Miner 23: 7.04 / 21.76 = 0.323
let miner_23_a100_weight = (a100_allocation as f64 * 0.323) as u16; // 6033B200 Category Distribution:
// Miner 45: 1.7 / 1.7 = 1.0
let miner_45_b200_weight = b200_allocation; // 12452Aggregate weights for miners with multiple categories:
// Miner 5 (H100 only)
weights[5] = 20987
// Miner 12 (H100 + A100)
weights[12] = miner_12_h100_weight + miner_12_a100_weight // 10148 + 12644 = 22792
// Miner 23 (A100 only)
weights[23] = 6033
// Miner 45 (B200 only)
weights[45] = 12452
// Burn UID
weights[204] = 3277Final Weight Vector:
UID | Weight | Percentage | GPUs
------|--------|------------|------------------
5 | 20987 | 32.0% | 8x H100
12 | 22792 | 34.8% | 4x H100 + 16x A100
23 | 6033 | 9.2% | 8x A100
45 | 12452 | 19.0% | 2x B200
204 | 3277 | 5.0% | BURN
------|--------|------------|------------------
Total | 65535 | 100.0% |
Block-Based Timing (bittensor_core/weight_setter.rs:80-120):
// Check current block every 12 seconds
loop {
let current_block = subtensor.get_current_block().await?;
// Check if time to set weights
let blocks_since_last = current_block - last_weight_set_block;
if blocks_since_last >= config.weight_set_interval_blocks {
// Time to set weights
set_weights().await?;
last_weight_set_block = current_block;
}
sleep(Duration::from_secs(12)).await; // Bittensor block time
}Configuration:
[emission]
weight_set_interval_blocks = 360 # ~72 minutes (360 * 12 seconds)Example Weight Setting Log:
2024-01-01T12:00:00Z INFO Current block: 1234567
2024-01-01T12:00:00Z INFO Last weight set: block 1234207 (360 blocks ago)
2024-01-01T12:00:00Z INFO Triggering weight set operation
2024-01-01T12:00:05Z INFO Calculated weights for 52 miners
2024-01-01T12:00:05Z INFO Burn allocation: 5.0% to UID 204
2024-01-01T12:00:10Z INFO Submitted weights to chain (tx: 0xabcd...)
2024-01-01T12:00:15Z INFO Weight set confirmed in block 1234568
Database Tracking (persistence/emission_metrics.rs):
-- Record emission event
INSERT INTO emission_metrics (
timestamp,
burn_amount,
burn_percentage,
category_distributions_json,
total_miners,
weight_set_block
) VALUES (
NOW(),
3277,
5.0,
'{"H100": 31129, "A100": 18677, "B200": 12452}',
52,
1234568
);
-- Record individual weight allocations
INSERT INTO weight_allocation_history (
miner_uid,
gpu_category,
allocated_weight,
miner_score,
category_total_score,
weight_set_block,
emission_metrics_id
) VALUES
(5, 'H100', 20987, 0.95, 11.28, 1234568, last_insert_id()),
(12, 'H100', 10148, 0.92, 11.28, 1234568, last_insert_id()),
(12, 'A100', 12644, 0.92, 21.76, 1234568, last_insert_id()),
(23, 'A100', 6033, 0.88, 21.76, 1234568, last_insert_id()),
(45, 'B200', 12452, 0.85, 1.7, 1234568, last_insert_id());Metrics Query Examples:
-- Get emission history
SELECT
timestamp,
burn_percentage,
total_miners,
weight_set_block
FROM emission_metrics
ORDER BY timestamp DESC
LIMIT 10;
-- Get miner weight history
SELECT
timestamp,
gpu_category,
allocated_weight,
miner_score
FROM weight_allocation_history
WHERE miner_uid = 12
ORDER BY timestamp DESC
LIMIT 20;
-- Get category distribution over time
SELECT
DATE(timestamp) as date,
AVG(CAST(json_extract(category_distributions_json, '$.H100') AS REAL)) as avg_h100_weight,
AVG(CAST(json_extract(category_distributions_json, '$.A100') AS REAL)) as avg_a100_weight
FROM emission_metrics
GROUP BY DATE(timestamp)
ORDER BY date DESC;Critical security considerations and operational best practices for running a validator.
Why Ephemeral Keys?
- Limited Exposure: Keys exist only during verification session
- Automatic Rotation: New key generated for each verification
- No Long-Term Storage: Validator doesn't store keys after use
- Miner Control: Miner controls key lifetime on nodes
Key Lifecycle:
Generate → Send to Miner → Miner Deploys → Validation → Cleanup
<1s <1s <5s 30-120s <1s
Enable Audit Logging:
[ssh_session]
enable_audit_logging = true
audit_log_path = "/var/log/basilica/ssh_audit.log"Audit Log Analysis:
# View recent SSH activity
tail -f /var/log/basilica/ssh_audit.log
# Count connections per node
grep "connected to" /var/log/basilica/ssh_audit.log | \
awk '{print $6}' | sort | uniq -c | sort -rn
# Find failed connections
grep "connection failed" /var/log/basilica/ssh_audit.log
# Calculate average session duration
grep "disconnected from" /var/log/basilica/ssh_audit.log | \
awk '{print $NF}' | sed 's/[^0-9]//g' | \
awk '{sum+=$1; count++} END {print sum/count "s"}'Hardening SSH Sessions:
[ssh_session]
# Connection timeouts prevent hanging connections
ssh_connection_timeout = { secs = 30, nanos = 0 }
ssh_command_timeout = { secs = 60, nanos = 0 }
# Retry logic for network issues
ssh_retry_attempts = 3
ssh_retry_delay = { secs = 2, nanos = 0 }
# Strict host key checking (false for dynamic nodes)
strict_host_key_checking = false
# Rate limiting prevents abuse
max_concurrent_sessions = 5
session_rate_limit = 20 # per hourMinimal Firewall Rules:
# Deny all incoming by default
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH for administration
sudo ufw allow 22/tcp
# Allow Bittensor axon
sudo ufw allow 9090/tcp
# Allow API server
sudo ufw allow 8080/tcp
# Allow metrics (if exposing externally)
sudo ufw allow 9090/tcp
# Enable firewall
sudo ufw enable
# Verify rules
sudo ufw status numberedAdvanced Rules with Rate Limiting:
# Rate limit SSH to prevent brute force
sudo ufw limit 22/tcp
# Rate limit API to prevent DoS
sudo ufw limit 8080/tcp
# Allow specific IP ranges for metrics
sudo ufw allow from 192.168.1.0/24 to any port 9090 proto tcpApplication-Level Rate Limiting (API server):
// Built into API server (code: api/mod.rs)
// - Request rate limiting per IP
// - Connection limits
// - Request size limits (1MB default)External Protection (recommended for production):
- Use Cloudflare or similar CDN for API endpoints
- Deploy behind reverse proxy (nginx, traefik)
- Enable fail2ban for SSH protection
Wallet Best Practices:
-
Separate Coldkey and Hotkey:
# Coldkey (high security, offline storage) ~/.bittensor/wallets/validator/coldkey # Hotkey (online, on validator server) ~/.bittensor/wallets/validator/hotkeys/default
-
Coldkey Storage:
- Keep coldkey on offline, encrypted storage
- Only use coldkey for staking/unstaking operations
- Never store coldkey on validator server in production
-
Hotkey Protection:
# Secure permissions chmod 600 ~/.bittensor/wallets/validator/hotkeys/default # Ownership chown validator:validator ~/.bittensor/wallets/validator/hotkeys/default
-
Backup Strategy:
# Backup coldkey (offline storage) cp ~/.bittensor/wallets/validator/coldkey /secure/backup/location/ # Backup hotkey (for disaster recovery) cp ~/.bittensor/wallets/validator/hotkeys/default /secure/backup/location/ # Encrypt backups gpg --symmetric --cipher-algo AES256 /secure/backup/location/coldkey
Enabled by Default:
- All gRPC requests from miners verified with Bittensor signatures
- Prevents impersonation attacks
- Uses sr25519 cryptography (Substrate standard)
Verification Flow (code: crypto/core.rs:verify_bittensor_signature):
// Verify signature on all incoming requests
let signature_valid = verify_bittensor_signature(
&miner_hotkey,
&signature_hex,
&payload_bytes
)?;
if !signature_valid {
return Err(Status::unauthenticated("Invalid signature"));
}File Permissions:
# Database file
chmod 600 data/validator.db
# Data directory
chmod 700 data/
# Ownership
chown validator:validator data/validator.dbBackup Strategy:
# Automated backups
*/15 * * * * sqlite3 /opt/basilica/data/validator.db ".backup '/opt/basilica/backups/validator_$(date +\%Y\%m\%d_\%H\%M\%S).db'"
# Retention policy (keep 7 days)
0 0 * * * find /opt/basilica/backups/ -name "validator_*.db" -mtime +7 -deleteConnection Security:
[database]
url = "postgresql://validator:STRONG_PASSWORD@localhost:5432/basilica?sslmode=require"PostgreSQL Configuration:
# /etc/postgresql/15/main/postgresql.conf
ssl = on
ssl_cert_file = '/etc/ssl/certs/server.crt'
ssl_key_file = '/etc/ssl/private/server.key'
# /etc/postgresql/15/main/pg_hba.conf
# Require SSL connections
hostssl basilica validator 127.0.0.1/32 scram-sha-256User Privileges (principle of least privilege):
-- Create validator database user
CREATE USER validator WITH PASSWORD 'STRONG_PASSWORD';
-- Grant only necessary privileges
GRANT CONNECT ON DATABASE basilica TO validator;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO validator;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA public TO validator;
-- Revoke dangerous privileges
REVOKE CREATE ON SCHEMA public FROM validator;
REVOKE DROP ON ALL TABLES IN SCHEMA public FROM validator;Validator Health:
# Up/down status
up{job="basilica-validator"}
# API response time
basilica_api_request_duration_seconds{quantile="0.95"}
# Verification success rate
rate(basilica_verification_success_total[5m]) / rate(basilica_verification_total[5m])
Verification Performance:
# Verifications per minute
rate(basilica_verification_total[1m]) * 60
# Average verification duration
rate(basilica_verification_duration_seconds_sum[5m]) / rate(basilica_verification_duration_seconds_count[5m])
# Failed verifications
rate(basilica_verification_failed_total[5m])
Weight Setting:
# Time since last weight set
time() - basilica_last_weight_set_timestamp
# Weight set errors
rate(basilica_weight_set_errors_total[1h])
Example Prometheus Alerts (prometheus/alerts.yml):
groups:
- name: basilica_validator
interval: 30s
rules:
- alert: ValidatorDown
expr: up{job="basilica-validator"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Validator is down"
description: "Validator has been down for 2 minutes"
- alert: VerificationFailureRate
expr: |
rate(basilica_verification_failed_total[5m]) /
rate(basilica_verification_total[5m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "High verification failure rate"
description: "More than 10% of verifications are failing"
- alert: WeightSetStale
expr: time() - basilica_last_weight_set_timestamp > 7200
for: 5m
labels:
severity: critical
annotations:
summary: "Weights not set recently"
description: "Weights haven't been set in over 2 hours"
- alert: DatabaseErrors
expr: rate(basilica_database_errors_total[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Database errors detected"
description: "Database is experiencing errors"Daily Tasks:
# Check validator health
curl http://localhost:8080/health
# Check recent verifications
journalctl -u basilica-validator --since "1 hour ago" | grep "Verification"
# Check database size
du -sh data/validator.dbWeekly Tasks:
# Review verification success rate
curl http://localhost:8080/verification/results | jq '.success_rate'
# Check disk space
df -h /opt/basilica
# Review error logs
journalctl -u basilica-validator -p err --since "1 week ago"
# Test backup restoration (on separate system)
cp backups/latest.db test/validator.db
sqlite3 test/validator.db "PRAGMA integrity_check;"Monthly Tasks:
# Update validator software
git pull
./scripts/validator/build.sh
sudo systemctl restart basilica-validator
# Review and rotate logs
journalctl --vacuum-time=30d
# Database optimization (SQLite)
sqlite3 data/validator.db "VACUUM; ANALYZE;"
# Security audit
sudo apt update && sudo apt upgrade
sudo ufw statusBackup Strategy:
- Database: Automated backups every 15 minutes, retain 7 days
- Configuration: Version controlled, backed up daily
- Wallet: Encrypted offline backup, multiple locations
- SSH Keys: Ephemeral, no backup needed
Recovery Procedure:
-
Server Failure:
# On new server git clone https://github.com/your-org/basilica.git ./scripts/validator/build.sh # Restore configuration cp backup/validator.toml /opt/basilica/config/ # Restore wallet mkdir -p ~/.bittensor/wallets/validator/hotkeys/ cp backup/hotkey ~/.bittensor/wallets/validator/hotkeys/default chmod 600 ~/.bittensor/wallets/validator/hotkeys/default # Restore database cp backup/validator.db /opt/basilica/data/ # Start validator sudo systemctl start basilica-validator
-
Database Corruption:
# Stop validator sudo systemctl stop basilica-validator # Restore from backup cp backups/validator_LATEST.db data/validator.db # Verify integrity sqlite3 data/validator.db "PRAGMA integrity_check;" # Restart validator sudo systemctl start basilica-validator
-
Wallet Compromise:
# IMMEDIATE: Stop validator sudo systemctl stop basilica-validator # Create new hotkey btcli wallet new_hotkey --wallet.name validator --wallet.hotkey new_hotkey # Transfer stake to new hotkey btcli stake add --wallet.name validator --wallet.hotkey new_hotkey --amount ALL # Update configuration sed -i 's/hotkey_name = "default"/hotkey_name = "new_hotkey"/' config/validator.toml # Restart validator sudo systemctl start basilica-validator
Comprehensive monitoring setup for validators using Prometheus and Grafana.
Built-in Metrics (exported on port 9090 by default):
# CPU usage
basilica_cpu_usage_percent
# Memory usage
basilica_memory_usage_bytes
basilica_memory_available_bytes
# Disk usage
basilica_disk_usage_bytes{path="/opt/basilica/data"}
basilica_disk_available_bytes{path="/opt/basilica/data"}
# Network I/O
rate(basilica_network_received_bytes_total[1m])
rate(basilica_network_transmitted_bytes_total[1m])
# Total verifications
basilica_verification_total{type="full"}
basilica_verification_total{type="lightweight"}
# Verification success/failure
basilica_verification_success_total
basilica_verification_failed_total
# Verification duration
basilica_verification_duration_seconds{quantile="0.5"}
basilica_verification_duration_seconds{quantile="0.95"}
basilica_verification_duration_seconds{quantile="0.99"}
# Active verifications
basilica_verification_active
# Miner discovery
basilica_miners_discovered_total
basilica_miners_verified_total
# Weight set operations
basilica_weight_set_total
basilica_weight_set_errors_total
# Last weight set timestamp
basilica_last_weight_set_timestamp
# Weight set duration
basilica_weight_set_duration_seconds
# HTTP requests
basilica_http_requests_total{method="GET",path="/miners"}
basilica_http_requests_total{status="200"}
# Request duration
basilica_http_request_duration_seconds{quantile="0.95"}
# Active connections
basilica_http_connections_active
# Database connections
basilica_database_connections_active
basilica_database_connections_idle
# Query duration
basilica_database_query_duration_seconds{operation="select"}
basilica_database_query_duration_seconds{operation="insert"}
# Database size
basilica_database_size_bytes
Import Pre-built Dashboard:
- Access Grafana:
http://localhost:3000 - Login: admin / admin
- Navigate: Dashboards → Import
- Upload:
grafana/dashboards/validator.json
Key Panels:
-
Overview:
- Validator uptime
- Total miners discovered
- Verification success rate
- Last weight set time
-
Verification Performance:
- Verifications per minute (graph)
- Verification duration (heatmap)
- Success/failure ratio (gauge)
- Active verifications (graph)
-
Resource Usage:
- CPU usage (graph)
- Memory usage (graph)
- Disk usage (graph)
- Network I/O (graph)
-
API Performance:
- Requests per second (graph)
- Request duration (heatmap)
- Error rate (graph)
- Active connections (graph)
-
Weight Setting:
- Weight set history (table)
- Burn percentage (stat)
- Category allocations (pie chart)
- Weight set duration (graph)
Example Dashboard JSON (abbreviated):
{
"dashboard": {
"title": "Basilica Validator",
"panels": [
{
"title": "Verification Success Rate",
"targets": [
{
"expr": "rate(basilica_verification_success_total[5m]) / rate(basilica_verification_total[5m]) * 100"
}
],
"type": "gauge"
},
{
"title": "Verifications per Minute",
"targets": [
{
"expr": "rate(basilica_verification_total[1m]) * 60",
"legendFormat": "{{type}}"
}
],
"type": "graph"
}
]
}
}HTTP Health Endpoint:
# Basic health check
curl http://localhost:8080/health
# Response (healthy)
{
"status": "healthy",
"timestamp": "2024-01-01T12:00:00Z",
"checks": {
"database": "ok",
"bittensor": "ok",
"verifications": "ok"
},
"metrics": {
"miners_discovered": 47,
"verifications_active": 12,
"last_weight_set": "2024-01-01T11:00:00Z"
}
}
# Response (unhealthy)
{
"status": "unhealthy",
"timestamp": "2024-01-01T12:00:00Z",
"checks": {
"database": "ok",
"bittensor": "error",
"verifications": "ok"
},
"errors": [
"Cannot connect to Bittensor chain"
]
}Automated Health Monitoring:
# Add to cron for periodic checks
*/5 * * * * curl -f http://localhost:8080/health || echo "Validator unhealthy!" | mail -s "Validator Alert" admin@example.comDocker Health Check:
# In docker-compose.yml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40sStructured Logging with tracing:
Log Levels:
TRACE: Very verbose, debugging internalsDEBUG: Detailed operational informationINFO: General operational informationWARN: Warning conditionsERROR: Error conditions
Key Log Patterns:
# Verification activity
journalctl -u basilica-validator | grep "Verification"
# Weight setting events
journalctl -u basilica-validator | grep "Weight set"
# Error tracking
journalctl -u basilica-validator -p err
# Performance analysis (verification duration)
journalctl -u basilica-validator | grep "Validation complete" | \
awk '{print $(NF-1)}' | sed 's/[^0-9.]//g' | \
awk '{sum+=$1; count++} END {print "Average:", sum/count "s"}'
# Miner discovery trends
journalctl -u basilica-validator --since "24 hours ago" | \
grep "Discovered.*miners" | \
awk '{print $1, $2, $NF}' | sed 's/miners//'Centralized Logging (optional):
# Ship logs to external service (e.g., Loki, Elasticsearch)
# Example: Promtail for Grafana Loki
# /etc/promtail/config.yml
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: basilica-validator
journal:
json: false
max_age: 12h
labels:
job: systemd-journal
service: basilica-validator
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'Common issues and solutions for validator operation.
Symptoms:
2024-01-01T12:00:00Z INFO Discovered 0 miners from metagraph
Possible Causes and Solutions:
-
Wrong Netuid:
# Check configuration grep "netuid" config/validator.toml # Should be: # [bittensor] netuid = 39 (mainnet) # [verification] netuid = 39 # Verify with Bittensor CLI btcli metagraph --netuid 39 | grep "MINER"
-
Chain Connection Issues:
# Test chain connectivity curl -v wss://entrypoint-finney.opentensor.ai:443 # Check logs for connection errors journalctl -u basilica-validator | grep "chain" # Try alternative chain endpoint # Edit config/validator.toml: # chain_endpoint = "wss://test.finney.opentensor.ai:443"
-
Wallet Not Registered:
# Check wallet registration btcli wallet overview --wallet.name validator --wallet.hotkey default # Re-register if needed btcli subnet register --netuid 39 --wallet.name validator --wallet.hotkey default
Symptoms:
2024-01-01T12:00:10Z ERROR [Full] SSH connection failed: Connection refused (192.168.1.100:22)
Possible Causes and Solutions:
-
Miner Not Deploying SSH Keys:
# Check if validator's SSH key was provided in auth request journalctl -u basilica-validator | grep "ssh_public_key" # Verify ephemeral key generation ls -la /tmp/validator_ssh_keys/ # Test manual SSH (won't work if key not deployed) ssh -i /tmp/validator_ssh_keys/latest.pem basilica@192.168.1.100
-
Network Connectivity:
# Test network reachability ping 192.168.1.100 # Test SSH port nc -zv 192.168.1.100 22 # Traceroute to identify network issues traceroute 192.168.1.100
-
Firewall Blocking Outbound SSH:
# Check if outbound SSH is allowed sudo ufw status | grep 22 # Allow outbound SSH if blocked sudo ufw allow out 22/tcp
-
SSH Timeout Too Short:
# Increase timeout in config/validator.toml [ssh_session] ssh_connection_timeout = { secs = 60, nanos = 0 } ssh_command_timeout = { secs = 120, nanos = 0 }
Symptoms:
2024-01-01T12:00:00Z ERROR Failed to set weights: Insufficient stake
Possible Causes and Solutions:
-
Insufficient Stake for Validator Permit:
# Check current stake btcli wallet overview --wallet.name validator # Check minimum required stake btcli metagraph --netuid 39 | grep "VALIDATOR" | head -1 # Add more stake btcli stake add --wallet.name validator --wallet.hotkey default --amount 5000
-
Weight Vector Invalid:
# Check logs for specific error journalctl -u basilica-validator | grep "set weights" # Common issues: # - Weights don't sum to 65535 (u16::MAX) # - Invalid UIDs in weight vector # - Empty weight vector # Verify GPU allocations total 100% grep "gpu_allocations" config/validator.toml -A 10
-
Chain Transaction Failure:
# Check for chain errors journalctl -u basilica-validator | grep -i "transaction\|extrinsic" # Possible solutions: # - Wait for next block and retry # - Check chain health: https://telemetry.polkadot.io/ # - Verify wallet has sufficient funds for transaction fees
Symptoms:
# Memory usage over 90%
free -h
# total used free
# Mem: 15Gi 14Gi 512MiPossible Causes and Solutions:
-
Database Cache Too Large:
# Reduce database connections [database] max_connections = 5 # Reduce from 10 min_connections = 1
-
Too Many Concurrent Verifications:
# Reduce concurrency limits [verification] max_concurrent_verifications = 20 # Reduce from 50 max_concurrent_full_validations = 512 # Reduce from 1024
-
Memory Leak (rare):
# Restart validator sudo systemctl restart basilica-validator # Monitor memory over time watch -n 5 free -h # If leak persists, report issue with logs
Symptoms:
2024-01-01T12:00:00Z ERROR Database error: database is locked
Possible Causes and Solutions:
-
SQLite Lock Contention:
# Check for other processes accessing database lsof data/validator.db # If using SQLite in production, consider PostgreSQL # SQLite not optimal for high concurrency
-
Database Corruption:
# Check integrity sqlite3 data/validator.db "PRAGMA integrity_check;" # If corrupted, restore from backup sudo systemctl stop basilica-validator cp backups/validator_LATEST.db data/validator.db sudo systemctl start basilica-validator
-
Disk Full:
# Check disk space df -h /opt/basilica # Clean up old logs journalctl --vacuum-time=7d # Enable database cleanup # In config/validator.toml: [cleanup] enabled = true
Symptoms:
curl http://localhost:8080/health
# curl: (7) Failed to connect to localhost port 8080: Connection refusedPossible Causes and Solutions:
-
API Server Not Started:
# Check if validator is running sudo systemctl status basilica-validator # Check logs for API startup journalctl -u basilica-validator | grep "API server listening" # Should see: "API server listening on 0.0.0.0:8080"
-
Port Already in Use:
# Check what's using port 8080 sudo lsof -i :8080 # Change API port in config [api] bind_address = "0.0.0.0:8081"
-
Firewall Blocking:
# Check firewall rules sudo ufw status | grep 8080 # Allow port if blocked sudo ufw allow 8080/tcp
Symptoms:
2024-01-01T12:00:00Z WARN Verification timeout: node 550e8400 exceeded 120s
Possible Causes and Solutions:
-
Slow Network Connection:
# Test network speed to node iperf3 -c 192.168.1.100 # Increase timeouts [verification] challenge_timeout = { secs = 300, nanos = 0 } # Increase from 120s [ssh_session] ssh_command_timeout = { secs = 300, nanos = 0 }
-
Binary Execution Slow on Node:
# Note: Binary-related troubleshooting excluded per user request # Check if node has sufficient resources # May need to exclude slow nodes from verification
-
Too Many Verifications in Parallel:
# Reduce concurrency to avoid resource exhaustion [verification] max_concurrent_verifications = 25 max_concurrent_full_validations = 256
Symptoms:
- Grafana shows "No data"
- Prometheus shows target as "Down"
Possible Causes and Solutions:
-
Metrics Not Enabled:
# Enable in config/validator.toml [metrics] enabled = true [metrics.prometheus] enabled = true port = 9090
-
Prometheus Not Scraping:
# Check Prometheus targets curl http://localhost:9091/targets # Verify Prometheus config cat scripts/validator/prometheus.yml # Should have: scrape_configs: - job_name: 'basilica-validator' static_configs: - targets: ['validator:9090']
-
Network Issue Between Prometheus and Validator:
# Test from Prometheus container docker exec basilica-prometheus curl http://validator:9090/metrics # Should return metrics output
Symptoms:
2024-01-01T12:00:00Z ERROR Wallet not found: ~/.bittensor/wallets/validator/hotkeys/default
Possible Causes and Solutions:
-
Wrong Wallet Path:
# Check actual wallet location ls -la ~/.bittensor/wallets/ # Update config to match [bittensor] wallet_name = "actual_wallet_name" hotkey_name = "actual_hotkey_name"
-
Wallet Not Mounted (Docker):
# Verify volume mount docker inspect basilica-validator | grep Mounts -A 20 # Should show: # "Source": "/home/user/.bittensor/wallets", # "Destination": "/root/.bittensor/wallets" # Recreate container with correct mount docker rm basilica-validator docker run -v ~/.bittensor/wallets:/root/.bittensor/wallets:ro ...
-
Permissions Issue:
# Check file permissions ls -la ~/.bittensor/wallets/validator/hotkeys/default # Fix if needed chmod 600 ~/.bittensor/wallets/validator/hotkeys/default chown $USER:$USER ~/.bittensor/wallets/validator/hotkeys/default
Advanced configurations and optimizations for experienced operators.
Implement Custom Strategy Selection:
Current validator uses time-based strategy (6 hours). For advanced use cases, you can modify strategy selection logic.
Example: Score-Based Strategy:
// Pseudocode for custom strategy
fn determine_strategy_custom(node: &Node, history: &VerificationHistory) -> Strategy {
// Use full validation for low-scoring nodes more frequently
if history.average_score < 0.7 {
// High-risk nodes: validate every 2 hours
if now - history.last_validation > Duration::from_secs(2 * 3600) {
return Strategy::Full;
}
}
// Use lightweight for high-performing nodes
if history.average_score > 0.95 && history.consecutive_successes > 10 {
// Low-risk nodes: validate every 12 hours
if now - history.last_validation > Duration::from_secs(12 * 3600) {
return Strategy::Full;
}
}
// Default: 6 hours
if now - history.last_validation > Duration::from_secs(6 * 3600) {
return Strategy::Full;
}
Strategy::Lightweight
}Configuration Location: crates/basilica-validator/src/miner_prover/validation_strategy.rs
Multi-Validator Architecture:
┌─────────────────┐
│ Load Balancer │
│ (HAProxy) │
└────────┬────────┘
│
┌────────┴────────┐
│ │
┌─────▼─────┐ ┌────▼──────┐
│Validator 1│ │Validator 2│
│ (Active) │ │ (Standby) │
└─────┬─────┘ └────┬──────┘
│ │
└────────┬────────┘
│
┌────────▼────────┐
│ PostgreSQL │
│ (Shared DB) │
└─────────────────┘
Configuration:
-
Shared Database (required):
# Both validators use same PostgreSQL database [database] url = "postgresql://validator:pass@db-server:5432/basilica"
-
Distributed Locking (prevent duplicate work):
// Implemented in: crates/basilica-common/src/distributed/postgres_lock.rs // Ensures only one validator performs verification/weight-setting at a time
-
Load Balancer Configuration (HAProxy example):
frontend validator_api bind *:8080 default_backend validator_servers backend validator_servers balance roundrobin option httpchk GET /health server validator1 192.168.1.10:8080 check server validator2 192.168.1.11:8080 check fall 3 rise 2
Failover Behavior:
- Active validator performs verifications and weight setting
- Standby monitors health via database heartbeat
- On active failure, standby takes over within 30 seconds
- Both validators can serve API requests (load balanced)
SQLite Tuning (for moderate load):
-- Add to database initialization
PRAGMA journal_mode = WAL; -- Write-Ahead Logging for concurrency
PRAGMA synchronous = NORMAL; -- Balance safety and performance
PRAGMA cache_size = -64000; -- 64MB cache
PRAGMA temp_store = MEMORY; -- Temp tables in memory
PRAGMA mmap_size = 30000000000; -- 30GB memory-mapped I/OPostgreSQL Tuning (for high load):
-- /etc/postgresql/15/main/postgresql.conf
# Memory settings (adjust based on available RAM)
shared_buffers = 4GB
effective_cache_size = 12GB
maintenance_work_mem = 1GB
work_mem = 50MB
# Checkpointing
checkpoint_completion_target = 0.9
wal_buffers = 16MB
max_wal_size = 4GB
# Query planner
random_page_cost = 1.1 # For SSD storage
effective_io_concurrency = 200
# Parallelism
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_worker_processes = 8
# Connection pooling (if using PgBouncer)
# Use transaction pooling mode for best performanceMaximize Concurrent Verifications:
[verification]
# Increase concurrent lightweight checks (low resource cost)
max_concurrent_verifications = 100
# Increase concurrent full validations (higher resource cost)
max_concurrent_full_validations = 2048
# Increase miners per round
max_miners_per_round = 50
# Reduce verification interval for faster discovery
verification_interval = { secs = 300, nanos = 0 } # 5 minutesWorker Queue for Horizontal Scaling (experimental):
[verification]
# Enable worker queue for distributed execution
enable_worker_queue = trueRequirements:
- Redis instance for queue management
- Multiple validator workers (separate processes/servers)
- Shared PostgreSQL database
Architecture:
Validator (Scheduler) → Redis Queue → Worker 1, Worker 2, Worker N
↓
PostgreSQL (Shared)
Connection Pooling:
[database]
# Optimize connection pool
max_connections = 20 # Increase from 10
min_connections = 5 # Increase from 1
[database.connect_timeout]
secs = 10 # Reduce from 30 for faster failuresSSH Connection Reuse:
Current implementation creates new SSH connection per verification. For optimization, implement connection pooling:
// Pseudocode for SSH connection pool
struct SshConnectionPool {
pools: HashMap<String, Vec<SshConnection>>,
max_per_host: usize,
}
impl SshConnectionPool {
fn get_connection(&mut self, host: &str) -> Result<SshConnection> {
if let Some(conn) = self.pools.get_mut(host).and_then(|p| p.pop()) {
if conn.is_alive() {
return Ok(conn);
}
}
// Create new connection if pool empty
SshConnection::new(host)
}
fn return_connection(&mut self, host: String, conn: SshConnection) {
self.pools.entry(host).or_default().push(conn);
}
}Implementation Location: crates/basilica-validator/src/ssh/connection_pool.rs (to be implemented)
Add New GPU Category:
[emission.gpu_allocations]
# Existing categories
H100 = { weight = 35.0, min_gpu_count = 4, min_gpu_vram = 80 }
A100 = { weight = 25.0, min_gpu_count = 2, min_gpu_vram = 40 }
B200 = { weight = 25.0, min_gpu_count = 1, min_gpu_vram = 192 }
H200 = { weight = 10.0, min_gpu_count = 1, min_gpu_vram = 141 }
# Add new category (e.g., L40S)
L40S = { weight = 5.0, min_gpu_count = 2, min_gpu_vram = 48 }GPU Name Matching (code: scoring/gpu_categorization.rs):
// GPU models are matched by string prefix
// Example: "NVIDIA H100 PCIe" matches category "H100"
fn categorize_gpu(model: &str) -> Option<String> {
if model.contains("H100") {
Some("H100".to_string())
} else if model.contains("A100") {
Some("A100".to_string())
} else if model.contains("B200") {
Some("B200".to_string())
} else if model.contains("H200") {
Some("H200".to_string())
} else if model.contains("L40S") {
Some("L40S".to_string())
} else {
None // Unknown GPU, not eligible for weights
}
}Weight Rebalancing:
- Ensure all
gpu_allocationsweights sum to 100.0 - Weights are percentages of total emissions (after burn)
- Higher weight = more emissions allocated to that category
Enable API Key Authentication:
[api]
api_key = "your-secret-api-key-here"Usage:
# Without API key (fails if enabled)
curl http://localhost:8080/miners
# Response: 401 Unauthorized
# With API key
curl -H "X-API-Key: your-secret-api-key-here" http://localhost:8080/miners
# Response: 200 OK with miner dataAdvanced: JWT Authentication (code: api/auth.rs):
For external service integration, implement JWT-based authentication:
// Pseudocode for JWT auth
struct JwtAuth {
secret: String,
issuer: String,
audience: String,
}
impl JwtAuth {
fn validate_token(&self, token: &str) -> Result<Claims> {
let validation = Validation {
iss: Some(self.issuer.clone()),
aud: Some(self.audience.clone()),
..Default::default()
};
decode::<Claims>(token, &DecodingKey::from_secret(self.secret.as_ref()), &validation)
.map(|data| data.claims)
}
}Integration with Auth0 (constants in common/src/auth_constants.rs):
// Auth0 configuration
pub const AUTH0_DOMAIN: &str = env!("AUTH0_DOMAIN");
pub const AUTH0_CLIENT_ID: &str = env!("AUTH0_CLIENT_ID");
pub const AUTH0_AUDIENCE: &str = env!("AUTH0_AUDIENCE");This guide covered comprehensive validator operation from setup to advanced optimization. Key takeaways:
Core Responsibilities:
- Discover miners from Bittensor metagraph
- Verify GPU nodes via direct SSH access
- Score miners based on performance and reliability
- Distribute emissions via weight setting
Two-Tier Verification:
- Full validation: Binary execution + hardware profiling (every 6 hours)
- Lightweight validation: SSH accessibility check (every 10 minutes)
Weight Setting:
- GPU category-based allocation (H100, A100, B200, etc.)
- Score-weighted distribution within categories
- Configurable burn mechanism
- Block-based timing (default: every 360 blocks)
Security:
- Ephemeral SSH keys for verification
- Cryptographic authentication with miners
- Audit logging for all SSH operations
- Wallet security best practices
Deployment:
- Four methods: Binary, Systemd, Docker, Docker Compose
- Automated deployment scripts
- Production-ready monitoring with Prometheus/Grafana
- Health checks and disaster recovery
Monitoring:
- Comprehensive Prometheus metrics
- Pre-built Grafana dashboards
- Alerting rules for critical conditions
- Log analysis and troubleshooting
Advanced Topics:
- High availability setup with load balancing
- Performance tuning for database and network
- Custom verification strategies
- API authentication and external integration
For additional support, refer to specific sections or consult the codebase at /root/workspace/spacejar/basilica/basilica/crates/basilica-validator/.