This project studies how cooperative behavior emerges and stabilizes in a spatial, resource-limited ecosystem by combining within-lifetime multi-agent reinforcement learning with population-level ecological and evolutionary dynamics. It explores the interplay between nature (inherited traits via reproduction and mutation) and nurture (behavior learned via reinforcement learning) in ecological systems. We combine Multi-Agent Deep Reinforcement Learning (MADRL) with evolutionary dynamics to explore emergent behaviors in a multi-agent dynamic ecosystem of Predators, Prey, and regenerating Grass. Agents differ by speed, vision, energy metabolism, and decision policies—offering ground for open-ended adaptation. At its core lies a gridworld simulation where agents are not just trained—they are born, age, reproduce, die, and even mutate in a continuously changing environment.
Emerging human cooperative hunting of Mammoths
- Mammoth hunting : Mammoths are only hunted down and eaten by a human(s) in its Moore neighborhood if the cumulative human energy is strictly larger than the mammoth's energy. On failure (if cumulative human energy is too low), humans optionally lose energy proportional to their share of the attacking group's energy (
energy_percentage_loss_per_failed_attacked_prey). On success, prey energy is split among attackers (proportional by default, optional equal split viateam_capture_equal_split). (implementation)
-
Base environment: The two-policy base environment. (results)
-
Mutating agents: A four-policy extension of the base environment. (results)
-
Centralized training: A single-policy variant of the base environment
-
Walls occlusion: An extension with walls and occluded vision
-
Reproduction kick back rewards: On top of direct reproduction rewards, agents receive indirect rewards when their children reproduce
-
Lineage rewards: On top of direct reproduction rewards, agents receive rewards when their offspring survives over time
-
Shared prey : This environment is very similar in logic to
mammoth hunting, but in this case the typical energy level of a prey is smaller than that of a predator. Withmammoth huntingthis is typically the other way around: prey possess more energy than predators.
-
Testing the Red Queen Hypothesis in the co-evolutionary setting of (non-mutating) predators and prey (implementation, results)
-
Testing the Red Queen Hypothesis in the co-evolutionary setting of mutating predators and prey (implementation, results)
- Hyperparameter tuning base environment - Population-Based Training (Implementation)
Editor used: Visual Studio Code 1.107.0 on Linux Mint 22.0 Cinnamon
- Clone the repository:
git clone https://github.com/doesburg11/PredPreyGrass.git
- Open Visual Studio Code and execute:
- Press
ctrl+shift+p - Type and choose: "Python: Create Environment..."
- Choose environment: Conda
- Choose interpreter: Python 3.11.13 or higher
- Open a new terminal
-
pip install -e .
- Press
- Install the additional system dependency for Pygame visualization:
-
conda install -y -c conda-forge gcc=14.2.0
-
Run the pre-trained policy in a Visual Studio Code terminal:
python ./src/predpreygrass/rllib/base_environment/evaluate_ppo_from_checkpoint_debug.py
Or a random policy:
python ./src/predpreygrass/rllib/base_environment/random_policy.py
