Skip to content

[2/6] Fix transposition table with proper bounds#32

Open
luccabb wants to merge 9 commits intofeature/psqt-evaluation-performancefrom
feature/transposition-table-bounds
Open

[2/6] Fix transposition table with proper bounds#32
luccabb wants to merge 9 commits intofeature/psqt-evaluation-performancefrom
feature/transposition-table-bounds

Conversation

@luccabb
Copy link
Owner

@luccabb luccabb commented Jan 20, 2026

Summary

  • Add Bound enum (EXACT, LOWER_BOUND, UPPER_BOUND) for correct TT behavior
  • Use Zobrist hash as cache key instead of FEN string (faster)
  • Store bound type and depth with each cache entry
  • Fix null move pruning condition (was missing null_move parameter check)
  • Update parallel engines to use new cache format

Details

The transposition table now correctly stores and uses bound information:

  • EXACT: Score is within the alpha-beta window, can be used directly
  • LOWER_BOUND: Node failed high (score >= beta), true score is at least this value
  • UPPER_BOUND: Node failed low (score <= alpha), true score is at most this value

This prevents incorrect cutoffs and score usage that can cause search instability.

Parallel Engine Updates

All parallel engines (lazy_smp, l1p, l2p) updated to:

  • Use Zobrist hash for cache key lookup
  • Use context managers for proper Pool/Manager cleanup
  • Fix score negation in l1p (opponent perspective -> our perspective)

Test plan

  • All 64 unit tests pass
  • Verified correct bound handling in search

🤖 Generated with Claude Code

@luccabb luccabb force-pushed the feature/transposition-table-bounds branch from bb2ac9f to f833c45 Compare January 21, 2026 06:40
@luccabb luccabb force-pushed the feature/psqt-evaluation-performance branch from bca3f3e to d7df55a Compare January 21, 2026 06:43
@luccabb luccabb force-pushed the feature/transposition-table-bounds branch from f833c45 to 7a14812 Compare January 21, 2026 06:43
@luccabb luccabb changed the base branch from feature/psqt-evaluation-performance to master January 21, 2026 07:00
@luccabb luccabb changed the base branch from master to feature/psqt-evaluation-performance January 21, 2026 07:06
@luccabb luccabb changed the title [2/9] Fix transposition table with proper bounds [2/7] Fix transposition table with proper bounds Jan 21, 2026
@luccabb luccabb force-pushed the feature/psqt-evaluation-performance branch from d7df55a to e21928a Compare January 21, 2026 07:33
luccabb and others added 2 commits January 20, 2026 23:33
Implements correct transposition table behavior with bound types:

**Transposition Table Changes:**
- Add `Bound` enum: EXACT, LOWER_BOUND, UPPER_BOUND
- Use Zobrist hash as cache key (fast integer vs slow FEN string)
- Store bound type and depth with each cache entry
- Only use cached scores when depth is sufficient
- Properly handle bound types in lookups:
  - EXACT: use score directly
  - LOWER_BOUND: use if score >= beta (fail high)
  - UPPER_BOUND: use if score <= alpha (fail low)

**Null Move Pruning Fix:**
- Added missing `null_move` parameter check (was always trying null move)

**Parallel Engine Updates:**
- Update lazy_smp, l1p, l2p to use new zobrist hash cache key
- Add context managers for Pool/Manager (proper resource cleanup)
- Fix score negation in l1p (opponent perspective -> our perspective)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@luccabb luccabb force-pushed the feature/transposition-table-bounds branch from 4f7fad5 to 1c7b8d9 Compare January 21, 2026 07:33
@luccabb luccabb changed the title [2/7] Fix transposition table with proper bounds [2/6] Fix transposition table with proper bounds Jan 21, 2026
luccabb and others added 5 commits January 22, 2026 12:23
This adds a chess reinforcement learning environment following the
OpenEnv interface pattern, with both local and HTTP client-server modes.

Features:
- ChessEnvironment class with configurable rewards, opponents, and game limits
- FastAPI server with REST endpoints (/reset, /step, /state, /engine-move)
- HTTP client for remote environment access
- Web UI for playing against the engine
- HuggingFace Spaces deployment configuration (Dockerfile, openenv.yaml)
- Example training scripts for local and remote usage

Also includes:
- mypy configuration for optional RL dependencies
- Import formatting fixes for ufmt compliance
* Add OpenEnv-compatible RL environment with HuggingFace Space

This adds a chess reinforcement learning environment following the
OpenEnv interface pattern, with both local and HTTP client-server modes.

Features:
- ChessEnvironment class with configurable rewards, opponents, and game limits
- FastAPI server with REST endpoints (/reset, /step, /state, /engine-move)
- HTTP client for remote environment access
- Web UI for playing against the engine
- HuggingFace Spaces deployment configuration (Dockerfile, openenv.yaml)
- Example training scripts for local and remote usage

Also includes:
- mypy configuration for optional RL dependencies
- Import formatting fixes for ufmt compliance

* Remove Elo claim and fix GitHub link to open in new tab
Fixes:
- Remove incorrect `bash .env` line (was trying to execute .env as script)
- Add `set -e` to exit on errors
- Check if brew is installed before using it
- Check if git-lfs/envsubst already installed before reinstalling
- Validate build succeeded before continuing
- Verify dist/moonfish exists before copying
- Check if lichess-bot directory exists
- Validate LICHESS_TOKEN is set after sourcing .env
- Validate token is not empty when creating .env
- Use `cp -f` instead of rm + cp

Improvements:
- Make lichess-bot directory configurable via LICHESS_BOT_DIR env var
- Add progress messages for better UX
- Provide helpful error messages with next steps

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* Add Stockfish benchmark CI workflow

- Runs cutechess-cli matches against Stockfish on every PR
- 20 rounds with max concurrency
- Moonfish: 60s per move, Stockfish: Skill Level 5 with 60+5 time control
- Downloads full 170MB opening book from release assets (bypasses LFS)
- Reports win/loss/draw stats in GitHub job summary
- Uploads PGN and logs as artifacts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Parallelize Stockfish benchmark with matrix strategy

- Run 20 parallel jobs (10 chunks × 2 skill levels)
- Test against both Stockfish skill level 4 and 5
- 100 games per skill level = 200 total games for reliable signal
- Add aggregation job to combine results with summary table
- Use different random seeds per chunk for opening variety

* Add PR comment with benchmark results

- Post aggregated results as a comment on the PR
- Makes it easy to see win/loss/draw rates without navigating to CI
- Includes collapsible configuration details

* Add -repeat flag for more consistent benchmark results

- Each opening is played twice with colors reversed
- Eliminates first-move advantage variance
- Doubles games to 400 total (200 per skill level)
- More statistically reliable results between runs

* Add detailed stats to benchmark PR comment

- Show win rates by color (as White / as Black)
- Show loss reasons (timeout, checkmate, adjudication)
- Separate tables per skill level for clarity

* Fix termination parsing and correct game count

- Parse game endings from PGN move text (cutechess format)
- Track: checkmate, timeout, resignation, stalemate, repetition, 50-move
- Fix config: 200 total games (not 400)

* Simplify game endings - parse merged PGN directly

- Remove per-chunk termination tracking
- Parse game endings from merged PGN in aggregate step
- Cleaner and less error-prone

* Extract game endings dynamically from PGN text

* Filter out mates from game endings (redundant with win/loss)

* Rename to 'Non-checkmate endings'

* Add skill level 3 and skip aggregate if all jobs cancelled

- Test against Stockfish skill levels 3, 4, and 5 (300 total games)
- Only run aggregate job if at least one benchmark succeeded

* Hardcode concurrency to 10 for faster benchmarks

* Increase to 20 rounds and 20 concurrency (600 total games)

* Reduce to 5 chunks (15 total jobs, 300 games)

* Add PR reactions: eyes on start, thumbs up on complete

- React with 👀 when benchmark starts
- React with 👍 after results are posted

* Add local benchmark script

* Add skill level 1, increase to 200 games per level (800 total)

* Revert CI changes, update local script: skill level 1, 200 games/level

* Add skill level 2 to local benchmark script

* Update benchmark settings

- Local script: 100 rounds, 15 concurrency
- CI: Remove eyes reaction when adding thumbs up

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
luccabb and others added 2 commits January 27, 2026 11:18
* Only run benchmarks when engine code changes

* Remove lichess from path filter (not engine code)

* Run benchmarks on PRs to any branch, not just master
@github-actions
Copy link

🔬 Stockfish Benchmark Results

vs Stockfish Skill Level 3

Metric Wins Losses Draws Total Win %
Overall 10 88 2 100 10.0%
As White 7 42 1 50 14.0%
As Black 3 46 1 50 6.0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 2

vs Stockfish Skill Level 4

Metric Wins Losses Draws Total Win %
Overall 7 91 2 100 7.0%
As White 3 46 1 50 6.0%
As Black 4 45 1 50 8.0%

Non-checkmate endings:

  • Draw by insufficient mating material: 1
  • Draw by 3-fold repetition: 1

vs Stockfish Skill Level 5

Metric Wins Losses Draws Total Win %
Overall 2 97 1 100 2.0%
As White 2 47 1 50 4.0%
As Black 0 50 0 50 0%

Non-checkmate endings:

  • Draw by 3-fold repetition: 1
Configuration
  • 5 chunks × 20 rounds × 3 skill levels = 300 total games
  • Each opening played with colors reversed (-repeat) for fairness
  • Moonfish: 60s per move
  • Stockfish: 60+5 time control

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant