A high-frequency game server optimization project using a massive multiplayer snake battle royale as the test scenario.
The project is now parked at this checkpoint, having achieved decent performance improvements while leaving room for future optimization work.
Learn and experiment with low-level optimization techniques in Rust applying them to a game server that pushes tick rates to extreme limits.
Achieved Performance: ~8.6 microseconds per tick with 1000 snakes (116,000+ ticks per second).
Massive Multiplayer Snake Battle Royale
- 1000+ snakes moving simultaneously on a shared 2D grid
- Each snake moves 1 unit in any direction per tick
- Snake collisions, food consumption, and territory conflicts
- Real-time performance constraints with sub-millisecond tick processing
The application runs as a single process with two dedicated threads pinned to separate CPU cores with a custom minimal Lock Free Queue for IPC:
+--------------------------------------------------------------------------+
| Game Server Process (main.rs) |
| |
| +-----------------------+ +----------------------------------+ |
| | Input Generator |------->| Game Logic | |
| | (Thread 1, Core A) | SPSC | (Thread 2, Core B) | |
| | | Queue | | |
| | - Generates random | | - Polls queue for inputs | |
| | (id, dir) inputs | | - Updates snake directions | |
| | - Pushes to queue | | - Executes game tick() | |
| +-----------------------+ +----------------------------------+ |
| |
+--------------------------------------------------------------------------+
- Game Engine: Complete snake movement, collision detection, and state management with spatial batching optimization
- Grid System: 10000×10000 cell grid with efficient spatial queries
- Snake Logic: Movement, growth, collision detection, and lifecycle management using TinyDeque
- Apple System: Food spawning and consumption mechanics
- SPSC Queue: Lock-free single-producer, single-consumer queue for inter-thread communication
- CPU Pinning: Thread affinity to specific CPU cores for cache locality
- Benchmark Suite: Comprehensive performance testing with Criterion.rs
- SPSC queue throughput and latency benchmarks
- Game logic performance benchmarks
- Integrated hot path measurements
- Hardware performance counter benchmarks
- Real-time Profiling: CPU cycle-accurate measurements using
_rdtsc() - Performance Metrics: Ticks per second, hot path latency, consume vs tick breakdown
- Cache Performance Analysis: cache hit rates and memory access patterns
- Branch Prediction Analysis: Branch misprediction rates and optimization opportunities
- SPSC Benchmarks: Queue throughput, latency, and contention testing
- Game Benchmarks: Pure game logic performance with varying snake counts and input loads
- Integrated Benchmarks: Complete hot path measurement (consume + tick) with pinned threads
- Performance Counter Benchmarks: Hardware-level cache hit rates, branch prediction, and IPC measurements
- Comprehensive Analysis: Multi-run statistical analysis across snake counts with JSON output
- Non-parallelizable core logic: Snake collisions and food consumption must be resolved sequentially
- Hot path critical: Movement, collision detection, and state updates happen every tick
- Memory access patterns: Random grid access creates cache pressure
- Real-time requirements: Sub-millisecond tick processing
- Rust 1.70+
- At least 2 CPU cores (for thread pinning)
Normal optimized run:
cargo run --releaseWith profiling enabled:
cargo run --profile profile --features profileRun benchmarks:
# SPSC queue performance
cargo bench --bench spsc_bench
# Game logic performance
cargo bench --bench game_bench
# Integrated hot path performance
cargo bench --bench integrated_bench
# Hardware performance counters (cache hit rate, branch prediction, IPC)
cargo bench --bench perf_counters_bench
# Comprehensive performance analysis across snake counts
cd benches && python3 perf_summary.pyWhen running with profiling enabled, you'll see real-time performance metrics:
Snake Battle Royale: Low Level Optimization Playground
Input generator thread started on core 0
Game logic thread started on core 1
Tick 1000: 10254.46 ticks/sec | Consume: avg=24203 cycles, min=13462 cycles, max=312781 cycles | Tick: avg=353914 cycles, min=290931 cycles, max=6333712 cycles
Tick 2000: 10653.39 ticks/sec | Consume: avg=24056 cycles, min=4450 cycles, max=312781 cycles | Tick: avg=339884 cycles, min=258011 cycles, max=6333712 cycles
Hardware Performance Counters:
# Single snake count analysis
cargo bench --bench perf_counters_bench perf_counters/100_snakes
# All snake counts with statistical analysis
cd benches && python3 perf_summary.pyLegacy Performance Tools (perf directory):
# Cache performance measurement
./perf/measure_cache.sh
# Pipeline performance analysis
./perf/measure_pipeline.shNote: The new perf_counters_bench and perf_summary.py provide more integrated and reliable performance analysis. See benches/README.md for detailed usage.
src/
├── main.rs # Application entry point with thread setup
├── lib.rs # Library root
├── game/ # Core game logic
│ ├── engine.rs # Game state and main tick loop (vector-based)
│ ├── grid.rs # 2D grid with spatial queries (10000×10000)
│ ├── snake.rs # Snake movement and lifecycle
│ ├── apple.rs # Food spawning and consumption
│ ├── generator.rs # Deterministic and random game state generation
│ └── types.rs # Game data structures
├── ipc/ # Inter-process communication
│ └── spsc.rs # Lock-free SPSC queue implementation
└── tests.rs # Comprehensive unit tests
benches/ # Performance benchmarks
├── spsc_bench.rs # SPSC queue performance tests
├── game_bench.rs # Game logic performance tests
├── integrated_bench.rs # Complete hot path measurements
├── perf_counters_bench.rs # Hardware performance counter benchmarks
├── perf_summary.py # Comprehensive performance analysis script
├── run_bench.sh # Benchmark runner with various options
└── README.md # Detailed benchmark documentation
perf/ # Legacy performance analysis tools
├── measure_cache.sh # Cache performance measurement scripts
├── measure_pipeline.sh # Pipeline performance analysis
└── perf_summary.json # Performance metrics database