Skip to content

Latest commit

 

History

History
155 lines (113 loc) · 4.02 KB

File metadata and controls

155 lines (113 loc) · 4.02 KB

CRForge Gymnasium Environment

Python RL training framework for the CRForge Clash Royale simulator.

Python (Gymnasium)  --ZMQ-->  Java (CRForge Simulation)
   env.step()       JSON       GameEngine.tick()
   env.reset()      PAIR       30 FPS deterministic

Prerequisites

  • Python 3.10+
  • Java 17 (for the bridge server)
  • The crforge project built: ./gradlew build

Installation

# Basic install (env + bridge client)
pip install -e python/

# With training dependencies (SB3 + TensorBoard)
pip install -e "python/[train]"

Quick Start

1. Start the Java bridge server

export JAVA_HOME=$(/usr/libexec/java_home -v 17)
./gradlew :gym-bridge:run

The server listens on tcp://localhost:9876 by default.

2. Run random episodes (smoke test)

python python/examples/run_episodes.py

This runs 3 episodes with random actions and prints per-episode stats.

3. Train a PPO agent

python python/examples/train_ppo.py --timesteps 50000

Monitor training:

tensorboard --logdir logs/ppo_crforge

4. Evaluate a trained model

python python/examples/evaluate.py --model models/ppo_crforge --episodes 50

Environment Details

Action Space

MultiDiscrete([2, 4, 18, 32])

Index Meaning Values
0 action_type 0=no-op, 1=play card
1 hand_index 0-3 (which card slot)
2 tile_x 0-17 (arena column)
3 tile_y 0-31 (arena row)

Observation Space

All float32 for SB3 compatibility. Spatial coordinates normalized to [0, 1].

Key Shape Description
frame (1,) Current simulation frame
game_time (1,) Game time in seconds (0-600)
is_overtime (1,) 1.0 if overtime
elixir (2,) [blue, red] elixir (0-10)
crowns (2,) [blue, red] crown count
hand_costs (4,) Card costs / 10 (normalized)
hand_types (4,) 0=troop, 1=spell, 2=building
next_card_cost (1,) Next card cost / 10
next_card_type (1,) Next card type
towers (6, 4) [hp_frac, x_norm, y_norm, alive]
entities (64, 7) [team, type, move, x, y, hp, shield]
num_entities (1,) Active entity count

Reward Structure

Source Magnitude Purpose
Tower damage +0.005/HP Incentivize attacking
Crown earned +1.0 Major milestone
Win +5.0 Terminal reward
Loss -5.0 Terminal penalty
Elixir waste -0.005/step Penalize capping at 10
Time penalty -0.0001/step Discourage passive play
Invalid action -0.01/step Penalize unaffordable plays

Configuration

env = CRForgeEnv(
    endpoint="tcp://localhost:9876",  # Bridge server address
    blue_deck=["knight", "archer", ...],  # 8 card IDs
    red_deck=["knight", "archer", ...],   # 8 card IDs
    level=11,                        # Card/tower level (1-15)
    ticks_per_step=6,                # Sim ticks per step (default: 6 = ~5 decisions/sec)
    opponent="random",               # "random", "noop", or callable
    invalid_action_penalty=-0.01,    # Penalty for failed actions
)

Deterministic Seeding

Pass seed to env.reset() for reproducible episodes:

obs, info = env.reset(seed=42)  # Same seed -> same deck shuffle

FlattenedObsWrapper

For SB3's MlpPolicy, wrap the env to flatten Dict obs into a single vector:

from crforge_gym.wrappers import FlattenedObsWrapper
env = FlattenedObsWrapper(CRForgeEnv(...))

Integration Tests

Require the Java server running:

# Start server in one terminal
./gradlew :gym-bridge:run

# Run tests in another
CRFORGE_INTEGRATION=1 pytest python/tests/ -v

Architecture

  • Java bridge (gym-bridge/): ZMQ PAIR server, wraps GameEngine in a step/reset API
  • Python bridge (crforge_gym/bridge.py): ZMQ PAIR client, JSON protocol
  • Gymnasium env (crforge_gym/env.py): Wraps bridge client in standard Gym interface
  • Wrappers (crforge_gym/wrappers.py): FlattenedObsWrapper for SB3 compatibility