Python RL training framework for the CRForge Clash Royale simulator.
Python (Gymnasium) --ZMQ--> Java (CRForge Simulation)
env.step() JSON GameEngine.tick()
env.reset() PAIR 30 FPS deterministic
- Python 3.10+
- Java 17 (for the bridge server)
- The crforge project built:
./gradlew build
# Basic install (env + bridge client)
pip install -e python/
# With training dependencies (SB3 + TensorBoard)
pip install -e "python/[train]"export JAVA_HOME=$(/usr/libexec/java_home -v 17)
./gradlew :gym-bridge:runThe server listens on tcp://localhost:9876 by default.
python python/examples/run_episodes.pyThis runs 3 episodes with random actions and prints per-episode stats.
python python/examples/train_ppo.py --timesteps 50000Monitor training:
tensorboard --logdir logs/ppo_crforgepython python/examples/evaluate.py --model models/ppo_crforge --episodes 50MultiDiscrete([2, 4, 18, 32])
| Index | Meaning | Values |
|---|---|---|
| 0 | action_type | 0=no-op, 1=play card |
| 1 | hand_index | 0-3 (which card slot) |
| 2 | tile_x | 0-17 (arena column) |
| 3 | tile_y | 0-31 (arena row) |
All float32 for SB3 compatibility. Spatial coordinates normalized to [0, 1].
| Key | Shape | Description |
|---|---|---|
| frame | (1,) | Current simulation frame |
| game_time | (1,) | Game time in seconds (0-600) |
| is_overtime | (1,) | 1.0 if overtime |
| elixir | (2,) | [blue, red] elixir (0-10) |
| crowns | (2,) | [blue, red] crown count |
| hand_costs | (4,) | Card costs / 10 (normalized) |
| hand_types | (4,) | 0=troop, 1=spell, 2=building |
| next_card_cost | (1,) | Next card cost / 10 |
| next_card_type | (1,) | Next card type |
| towers | (6, 4) | [hp_frac, x_norm, y_norm, alive] |
| entities | (64, 7) | [team, type, move, x, y, hp, shield] |
| num_entities | (1,) | Active entity count |
| Source | Magnitude | Purpose |
|---|---|---|
| Tower damage | +0.005/HP | Incentivize attacking |
| Crown earned | +1.0 | Major milestone |
| Win | +5.0 | Terminal reward |
| Loss | -5.0 | Terminal penalty |
| Elixir waste | -0.005/step | Penalize capping at 10 |
| Time penalty | -0.0001/step | Discourage passive play |
| Invalid action | -0.01/step | Penalize unaffordable plays |
env = CRForgeEnv(
endpoint="tcp://localhost:9876", # Bridge server address
blue_deck=["knight", "archer", ...], # 8 card IDs
red_deck=["knight", "archer", ...], # 8 card IDs
level=11, # Card/tower level (1-15)
ticks_per_step=6, # Sim ticks per step (default: 6 = ~5 decisions/sec)
opponent="random", # "random", "noop", or callable
invalid_action_penalty=-0.01, # Penalty for failed actions
)Pass seed to env.reset() for reproducible episodes:
obs, info = env.reset(seed=42) # Same seed -> same deck shuffleFor SB3's MlpPolicy, wrap the env to flatten Dict obs into a single vector:
from crforge_gym.wrappers import FlattenedObsWrapper
env = FlattenedObsWrapper(CRForgeEnv(...))Require the Java server running:
# Start server in one terminal
./gradlew :gym-bridge:run
# Run tests in another
CRFORGE_INTEGRATION=1 pytest python/tests/ -v- Java bridge (
gym-bridge/): ZMQ PAIR server, wraps GameEngine in a step/reset API - Python bridge (
crforge_gym/bridge.py): ZMQ PAIR client, JSON protocol - Gymnasium env (
crforge_gym/env.py): Wraps bridge client in standard Gym interface - Wrappers (
crforge_gym/wrappers.py): FlattenedObsWrapper for SB3 compatibility