diff --git a/AGENTS.md b/AGENTS.md index 5b13567ec..a8e650cc2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -71,7 +71,7 @@ def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment: async def correct_answer(completion, answer) -> float: completion_ans = completion[-1]['content'] return 1.0 if completion_ans == answer else 0.0 - rubric = Rubric(funcs=[correct_answer]) + rubric = vf.Rubric(funcs=[correct_answer]) env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric) return env ``` @@ -88,7 +88,7 @@ prime env install primeintellect/math-python To run a local evaluation with any OpenAI-compatible model, do: ```bash -prime eval run my-env -m gpt-5-nano # run and save eval results locally +prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally ``` Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.py`. diff --git a/README.md b/README.md index 98101bf72..d58cff1cd 100644 --- a/README.md +++ b/README.md @@ -109,7 +109,7 @@ def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment: async def correct_answer(completion, answer) -> float: completion_ans = completion[-1]['content'] return 1.0 if completion_ans == answer else 0.0 - rubric = Rubric(funcs=[correct_answer]) + rubric = vf.Rubric(funcs=[correct_answer]) env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric) return env ``` @@ -126,7 +126,7 @@ prime env install primeintellect/math-python To run a local evaluation with any OpenAI-compatible model, do: ```bash -prime eval run my-env -m gpt-5-nano # run and save eval results locally +prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally ``` Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.py`. @@ -147,17 +147,17 @@ prime eval run primeintellect/math-python ## Documentation -**[Environments](environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols. +**[Environments](docs/environments.md)** — Create datasets, rubrics, and custom multi-turn interaction protocols. -**[Evaluation](evaluation.md)** - Evaluate models using your environments. +**[Evaluation](docs/evaluation.md)** - Evaluate models using your environments. -**[Training](training.md)** — Train models in your environments with reinforcement learning. +**[Training](docs/training.md)** — Train models in your environments with reinforcement learning. -**[Development](development.md)** — Contributing to verifiers +**[Development](docs/development.md)** — Contributing to verifiers -**[API Reference](reference.md)** — Understanding the API and data structures +**[API Reference](docs/reference.md)** — Understanding the API and data structures -**[FAQs](faqs.md)** - Other frequently asked questions. +**[FAQs](docs/faqs.md)** - Other frequently asked questions. ## Citation diff --git a/docs/development.md b/docs/development.md index bcb93ae1b..b6912be9c 100644 --- a/docs/development.md +++ b/docs/development.md @@ -194,7 +194,7 @@ prime env init my-environment prime env install my-environment # Test your environment -prime eval run my-environment -m gpt-4.1-mini -n 5 +prime eval run my-environment -m openai/gpt-4.1-mini -n 5 ``` ### Environment Module Structure @@ -248,10 +248,10 @@ uv run ruff check --fix . # Fix lint errors uv run pre-commit run --all-files # Run all pre-commit hooks # Environment tools -prime env init new-env # Create environment -prime env install new-env # Install environment -prime eval run new-env -m gpt-4.1-mini -n 5 # Test environment -prime eval tui # Browse eval results +prime env init new-env # Create environment +prime env install new-env # Install environment +prime eval run new-env -m openai/gpt-4.1-mini -n 5 # Test environment +prime eval tui # Browse eval results ``` ### CLI Tools diff --git a/docs/evaluation.md b/docs/evaluation.md index e8176b382..417738887 100644 --- a/docs/evaluation.md +++ b/docs/evaluation.md @@ -24,7 +24,7 @@ Environments must be installed as Python packages before evaluation. From a loca ```bash prime env install my-env # installs ./environments/my_env as a package -prime eval run my-env -m gpt-4.1-mini -n 10 +prime eval run my-env -m openai/gpt-4.1-mini -n 10 ``` `prime eval` imports the environment module using Python's import system, calls its `load_environment()` function, runs 5 examples with 3 rollouts each (the default), scores them using the environment's rubric, and prints aggregate metrics. diff --git a/docs/faqs.md b/docs/faqs.md index 47f9d1370..d64ca9c48 100644 --- a/docs/faqs.md +++ b/docs/faqs.md @@ -7,7 +7,7 @@ Use `prime eval run` with a small sample: ```bash -prime eval run my-environment -m gpt-4.1-mini -n 5 +prime eval run my-environment -m openai/gpt-4.1-mini -n 5 ``` The `-s` flag prints sample outputs so you can see what's happening. @@ -32,7 +32,7 @@ vf.print_prompt_completions_sample(outputs, n=3) Set the `VF_LOG_LEVEL` environment variable: ```bash -VF_LOG_LEVEL=DEBUG prime eval run my-environment -m gpt-4.1-mini -n 5 +VF_LOG_LEVEL=DEBUG prime eval run my-environment -m openai/gpt-4.1-mini -n 5 ``` ## Environments diff --git a/docs/overview.md b/docs/overview.md index d0fa11a15..94dd2159d 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -67,7 +67,7 @@ def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment: async def correct_answer(completion, answer) -> float: completion_ans = completion[-1]['content'] return 1.0 if completion_ans == answer else 0.0 - rubric = Rubric(funcs=[correct_answer]) + rubric = vf.Rubric(funcs=[correct_answer]) env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric) return env ``` @@ -84,7 +84,7 @@ prime env install primeintellect/math-python To run a local evaluation with any OpenAI-compatible model, do: ```bash -prime eval run my-env -m gpt-5-nano # run and save eval results locally +prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally ``` Evaluations use [Prime Inference](https://docs.primeintellect.ai/inference/overview) by default; configure your own API endpoints in `./configs/endpoints.py`. diff --git a/verifiers/AGENTS.md b/verifiers/AGENTS.md index 52f2378c4..39d9b8a7d 100644 --- a/verifiers/AGENTS.md +++ b/verifiers/AGENTS.md @@ -109,7 +109,7 @@ vf-init new-environment # Install + test prime env install new-environment -prime eval run new-environment -n 5 -m gpt-4.1-mini +prime eval run new-environment -n 5 -m openai/gpt-4.1-mini ``` ### Requirements diff --git a/verifiers/scripts/init.py b/verifiers/scripts/init.py index f0e331a1a..5527239cb 100644 --- a/verifiers/scripts/init.py +++ b/verifiers/scripts/init.py @@ -34,7 +34,7 @@ ```bash prime eval run {env_id_dash} \ - -m gpt-4.1-mini \ + -m openai/gpt-4.1-mini \ -n 20 -r 3 -t 1024 -T 0.7 \ -a '{{"key": "value"}}' # env-specific args as JSON ```