[6/6] Add futility pruning #38

luccabb · 2026-01-20T04:29:27Z

Summary

Skip quiet moves at low depths when static evaluation is far below alpha
Margin-based pruning: depth 1 = 100cp, depth 2 = 200cp

Details

Futility Pruning

If the static evaluation plus a safety margin is still below alpha, quiet moves
are unlikely to improve the position enough to beat alpha. We skip them.

Implementation:

if depth <= 2 and not in_check:
    static_eval = self.eval_board(board)
    futility_margin = 100 * depth  # 100cp at depth 1, 200cp at depth 2
    can_futility_prune = static_eval + futility_margin < alpha

# In move loop:
if can_futility_prune and move_index > 0 and is_quiet_move:
    continue  # Skip this move

Safety Conditions

Pruning is conservative to avoid missing important moves:

Only at low depths (1-2 ply)
Never when in check
Never prune captures, checks, or promotions
Never prune the first move
Margin allows for reasonable positional improvement

Why it Works

At low depths near leaf nodes:

Quiet moves have limited potential to swing evaluation
If we're already losing by > margin, a quiet move won't save us
Captures/checks are still searched (tactical opportunities)

Test plan

All 64 unit tests pass
Verified pruning only happens for quiet moves at low depth

🤖 Generated with Claude Code

Implements futility pruning to skip quiet moves that can't improve alpha: **Futility Pruning:** - At low depths (1-2), compute static evaluation - If eval + margin < alpha, quiet moves can't help - Skip quiet moves (no capture, check, or promotion) - Never prune the first move (might be the only good one) **Margin Calculation:** - Depth 1: 100 centipawns margin - Depth 2: 200 centipawns margin - Larger margin at deeper depths allows for more potential improvement **Conditions for pruning:** - Depth <= 2 - Not in check (check positions are critical) - Static eval + margin < alpha - Move is quiet (not capture/check/promotion) - Not the first move in the list This is a forward pruning technique that can miss some moves, but the marginsare conservative enough to rarely affect results while significantly reducing nodes searched. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

This adds a chess reinforcement learning environment following the OpenEnv interface pattern, with both local and HTTP client-server modes. Features: - ChessEnvironment class with configurable rewards, opponents, and game limits - FastAPI server with REST endpoints (/reset, /step, /state, /engine-move) - HTTP client for remote environment access - Web UI for playing against the engine - HuggingFace Spaces deployment configuration (Dockerfile, openenv.yaml) - Example training scripts for local and remote usage Also includes: - mypy configuration for optional RL dependencies - Import formatting fixes for ufmt compliance

* Add OpenEnv-compatible RL environment with HuggingFace Space This adds a chess reinforcement learning environment following the OpenEnv interface pattern, with both local and HTTP client-server modes. Features: - ChessEnvironment class with configurable rewards, opponents, and game limits - FastAPI server with REST endpoints (/reset, /step, /state, /engine-move) - HTTP client for remote environment access - Web UI for playing against the engine - HuggingFace Spaces deployment configuration (Dockerfile, openenv.yaml) - Example training scripts for local and remote usage Also includes: - mypy configuration for optional RL dependencies - Import formatting fixes for ufmt compliance * Remove Elo claim and fix GitHub link to open in new tab

Fixes: - Remove incorrect `bash .env` line (was trying to execute .env as script) - Add `set -e` to exit on errors - Check if brew is installed before using it - Check if git-lfs/envsubst already installed before reinstalling - Validate build succeeded before continuing - Verify dist/moonfish exists before copying - Check if lichess-bot directory exists - Validate LICHESS_TOKEN is set after sourcing .env - Validate token is not empty when creating .env - Use `cp -f` instead of rm + cp Improvements: - Make lichess-bot directory configurable via LICHESS_BOT_DIR env var - Add progress messages for better UX - Provide helpful error messages with next steps Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

* Add Stockfish benchmark CI workflow - Runs cutechess-cli matches against Stockfish on every PR - 20 rounds with max concurrency - Moonfish: 60s per move, Stockfish: Skill Level 5 with 60+5 time control - Downloads full 170MB opening book from release assets (bypasses LFS) - Reports win/loss/draw stats in GitHub job summary - Uploads PGN and logs as artifacts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Parallelize Stockfish benchmark with matrix strategy - Run 20 parallel jobs (10 chunks × 2 skill levels) - Test against both Stockfish skill level 4 and 5 - 100 games per skill level = 200 total games for reliable signal - Add aggregation job to combine results with summary table - Use different random seeds per chunk for opening variety * Add PR comment with benchmark results - Post aggregated results as a comment on the PR - Makes it easy to see win/loss/draw rates without navigating to CI - Includes collapsible configuration details * Add -repeat flag for more consistent benchmark results - Each opening is played twice with colors reversed - Eliminates first-move advantage variance - Doubles games to 400 total (200 per skill level) - More statistically reliable results between runs * Add detailed stats to benchmark PR comment - Show win rates by color (as White / as Black) - Show loss reasons (timeout, checkmate, adjudication) - Separate tables per skill level for clarity * Fix termination parsing and correct game count - Parse game endings from PGN move text (cutechess format) - Track: checkmate, timeout, resignation, stalemate, repetition, 50-move - Fix config: 200 total games (not 400) * Simplify game endings - parse merged PGN directly - Remove per-chunk termination tracking - Parse game endings from merged PGN in aggregate step - Cleaner and less error-prone * Extract game endings dynamically from PGN text * Filter out mates from game endings (redundant with win/loss) * Rename to 'Non-checkmate endings' * Add skill level 3 and skip aggregate if all jobs cancelled - Test against Stockfish skill levels 3, 4, and 5 (300 total games) - Only run aggregate job if at least one benchmark succeeded * Hardcode concurrency to 10 for faster benchmarks * Increase to 20 rounds and 20 concurrency (600 total games) * Reduce to 5 chunks (15 total jobs, 300 games) * Add PR reactions: eyes on start, thumbs up on complete - React with 👀 when benchmark starts - React with 👍 after results are posted * Add local benchmark script * Add skill level 1, increase to 200 games per level (800 total) * Revert CI changes, update local script: skill level 1, 200 games/level * Add skill level 2 to local benchmark script * Update benchmark settings - Local script: 100 rounds, 15 concurrency - CI: Remove eyes reaction when adding thumbs up --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

…uning

* Only run benchmarks when engine code changes * Remove lichess from path filter (not engine code) * Run benchmarks on PRs to any branch, not just master

…uning

github-actions · 2026-01-27T21:06:06Z

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric	Losses	Total	Win %
Overall	100	100	0%
As White	50	50	0%
As Black	50	50	0%

vs Stockfish Skill Level 4

Metric	Wins	Losses	Total	Win %
Overall	1	99	100	1.0%
As White	0	50	50	0%
As Black	1	49	50	2.0%

vs Stockfish Skill Level 5

Metric	Losses	Total	Win %
Overall	100	100	0%
As White	50	50	0%
As Black	50	50	0%

Configuration

5 chunks × 20 rounds × 3 skill levels = 300 total games
Each opening played with colors reversed (-repeat) for fairness
Moonfish: 60s per move
Stockfish: 60+5 time control

luccabb force-pushed the feature/lmr-pvs branch from 3dd9239 to 42d2d60 Compare January 21, 2026 06:42

luccabb force-pushed the feature/futility-pruning branch from 9f14f89 to 5019fdf Compare January 21, 2026 06:42

luccabb force-pushed the feature/lmr-pvs branch from 42d2d60 to bd310e6 Compare January 21, 2026 06:44

luccabb force-pushed the feature/futility-pruning branch from 5019fdf to 369da0e Compare January 21, 2026 06:44

luccabb changed the title ~~[8/9] Add futility pruning~~ [6/7] Add futility pruning Jan 21, 2026

luccabb force-pushed the feature/lmr-pvs branch from bd310e6 to caf3d13 Compare January 21, 2026 07:33

luccabb force-pushed the feature/futility-pruning branch from 369da0e to fe21981 Compare January 21, 2026 07:33

luccabb changed the title ~~[6/7] Add futility pruning~~ [6/6] Add futility pruning Jan 21, 2026

luccabb and others added 5 commits January 22, 2026 12:23

Merge remote-tracking branch 'origin/master' into feature/futility-pr…

ef96475

…uning

luccabb mentioned this pull request Jan 27, 2026

Only run benchmarks when engine code changes #44

Merged

luccabb and others added 2 commits January 27, 2026 11:18

Only run benchmarks when engine code changes (#44)

b437392

* Only run benchmarks when engine code changes * Remove lichess from path filter (not engine code) * Run benchmarks on PRs to any branch, not just master

Merge remote-tracking branch 'origin/master' into feature/futility-pr…

65d1f14

…uning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[6/6] Add futility pruning #38

[6/6] Add futility pruning #38

Uh oh!

luccabb commented Jan 20, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[6/6] Add futility pruning #38

Are you sure you want to change the base?

[6/6] Add futility pruning #38

Uh oh!

Conversation

luccabb commented Jan 20, 2026

Summary

Details

Futility Pruning

Safety Conditions

Why it Works

Test plan

Uh oh!

github-actions bot commented Jan 27, 2026

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

vs Stockfish Skill Level 4

vs Stockfish Skill Level 5

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants