-
Notifications
You must be signed in to change notification settings - Fork 4
[6/6] Add futility pruning #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
luccabb
wants to merge
8
commits into
feature/lmr-pvs
Choose a base branch
from
feature/futility-pruning
base: feature/lmr-pvs
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3dd9239 to
42d2d60
Compare
9f14f89 to
5019fdf
Compare
42d2d60 to
bd310e6
Compare
5019fdf to
369da0e
Compare
bd310e6 to
caf3d13
Compare
Implements futility pruning to skip quiet moves that can't improve alpha: **Futility Pruning:** - At low depths (1-2), compute static evaluation - If eval + margin < alpha, quiet moves can't help - Skip quiet moves (no capture, check, or promotion) - Never prune the first move (might be the only good one) **Margin Calculation:** - Depth 1: 100 centipawns margin - Depth 2: 200 centipawns margin - Larger margin at deeper depths allows for more potential improvement **Conditions for pruning:** - Depth <= 2 - Not in check (check positions are critical) - Static eval + margin < alpha - Move is quiet (not capture/check/promotion) - Not the first move in the list This is a forward pruning technique that can miss some moves, but the marginsare conservative enough to rarely affect results while significantly reducing nodes searched. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
369da0e to
fe21981
Compare
This adds a chess reinforcement learning environment following the OpenEnv interface pattern, with both local and HTTP client-server modes. Features: - ChessEnvironment class with configurable rewards, opponents, and game limits - FastAPI server with REST endpoints (/reset, /step, /state, /engine-move) - HTTP client for remote environment access - Web UI for playing against the engine - HuggingFace Spaces deployment configuration (Dockerfile, openenv.yaml) - Example training scripts for local and remote usage Also includes: - mypy configuration for optional RL dependencies - Import formatting fixes for ufmt compliance
* Add OpenEnv-compatible RL environment with HuggingFace Space This adds a chess reinforcement learning environment following the OpenEnv interface pattern, with both local and HTTP client-server modes. Features: - ChessEnvironment class with configurable rewards, opponents, and game limits - FastAPI server with REST endpoints (/reset, /step, /state, /engine-move) - HTTP client for remote environment access - Web UI for playing against the engine - HuggingFace Spaces deployment configuration (Dockerfile, openenv.yaml) - Example training scripts for local and remote usage Also includes: - mypy configuration for optional RL dependencies - Import formatting fixes for ufmt compliance * Remove Elo claim and fix GitHub link to open in new tab
Fixes: - Remove incorrect `bash .env` line (was trying to execute .env as script) - Add `set -e` to exit on errors - Check if brew is installed before using it - Check if git-lfs/envsubst already installed before reinstalling - Validate build succeeded before continuing - Verify dist/moonfish exists before copying - Check if lichess-bot directory exists - Validate LICHESS_TOKEN is set after sourcing .env - Validate token is not empty when creating .env - Use `cp -f` instead of rm + cp Improvements: - Make lichess-bot directory configurable via LICHESS_BOT_DIR env var - Add progress messages for better UX - Provide helpful error messages with next steps Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Add Stockfish benchmark CI workflow - Runs cutechess-cli matches against Stockfish on every PR - 20 rounds with max concurrency - Moonfish: 60s per move, Stockfish: Skill Level 5 with 60+5 time control - Downloads full 170MB opening book from release assets (bypasses LFS) - Reports win/loss/draw stats in GitHub job summary - Uploads PGN and logs as artifacts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Parallelize Stockfish benchmark with matrix strategy - Run 20 parallel jobs (10 chunks × 2 skill levels) - Test against both Stockfish skill level 4 and 5 - 100 games per skill level = 200 total games for reliable signal - Add aggregation job to combine results with summary table - Use different random seeds per chunk for opening variety * Add PR comment with benchmark results - Post aggregated results as a comment on the PR - Makes it easy to see win/loss/draw rates without navigating to CI - Includes collapsible configuration details * Add -repeat flag for more consistent benchmark results - Each opening is played twice with colors reversed - Eliminates first-move advantage variance - Doubles games to 400 total (200 per skill level) - More statistically reliable results between runs * Add detailed stats to benchmark PR comment - Show win rates by color (as White / as Black) - Show loss reasons (timeout, checkmate, adjudication) - Separate tables per skill level for clarity * Fix termination parsing and correct game count - Parse game endings from PGN move text (cutechess format) - Track: checkmate, timeout, resignation, stalemate, repetition, 50-move - Fix config: 200 total games (not 400) * Simplify game endings - parse merged PGN directly - Remove per-chunk termination tracking - Parse game endings from merged PGN in aggregate step - Cleaner and less error-prone * Extract game endings dynamically from PGN text * Filter out mates from game endings (redundant with win/loss) * Rename to 'Non-checkmate endings' * Add skill level 3 and skip aggregate if all jobs cancelled - Test against Stockfish skill levels 3, 4, and 5 (300 total games) - Only run aggregate job if at least one benchmark succeeded * Hardcode concurrency to 10 for faster benchmarks * Increase to 20 rounds and 20 concurrency (600 total games) * Reduce to 5 chunks (15 total jobs, 300 games) * Add PR reactions: eyes on start, thumbs up on complete - React with 👀 when benchmark starts - React with 👍 after results are posted * Add local benchmark script * Add skill level 1, increase to 200 games per level (800 total) * Revert CI changes, update local script: skill level 1, 200 games/level * Add skill level 2 to local benchmark script * Update benchmark settings - Local script: 100 rounds, 15 concurrency - CI: Remove eyes reaction when adding thumbs up --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Only run benchmarks when engine code changes * Remove lichess from path filter (not engine code) * Run benchmarks on PRs to any branch, not just master
🔬 Stockfish Benchmark Resultsvs Stockfish Skill Level 3
vs Stockfish Skill Level 4
vs Stockfish Skill Level 5
Configuration
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Details
Futility Pruning
If the static evaluation plus a safety margin is still below alpha, quiet moves
are unlikely to improve the position enough to beat alpha. We skip them.
Implementation:
Safety Conditions
Pruning is conservative to avoid missing important moves:
Why it Works
At low depths near leaf nodes:
Test plan
🤖 Generated with Claude Code