GitHub - modelscope/OpenJudge: OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

Holistic Evaluation, Quality Rewards: Driving Application Excellence

🌟 If you find OpenJudge helpful, please give us a Star! 🌟

📑 Table of Contents

Key Features
News
Installation
Quickstart
Integrations
Contributing
Citation

OpenJudge is a unified framework designed to drive LLM and Agent application excellence through Holistic Evaluation and Quality Rewards.

💡 Evaluation and reward signals are the cornerstones of application excellence. Holistic evaluation enables the systematic analysis of shortcomings to drive rapid iteration, while high-quality rewards provide the essential foundation for advanced optimization and fine-tuning.

OpenJudge unifies evaluation metrics and reward signals into a single, standardized Grader interface, offering pre-built graders, flexible customization, and seamless framework integration.

✨ Key Features

📦 Systematic & Quality-Assured Grader Library

Access 50+ production-ready graders featuring a comprehensive taxonomy, rigorously validated for reliable performance.

🎯 General

Focus: Semantic quality, functional correctness, structural compliance

Key Graders:

Relevance - Semantic relevance scoring
Similarity - Text similarity measurement
Syntax Check - Code syntax validation
JSON Match - Structure compliance

🤖 Agent

Focus: Agent lifecycle, tool calling, memory, plan feasibility, trajectory quality

Key Graders:

Tool Selection - Tool choice accuracy
Memory - Context preservation
Plan - Strategy feasibility
Trajectory - Path optimization

🖼️ Multimodal

Focus: Image-text coherence, visual generation quality, image helpfulness

Key Graders:

Image Coherence - Visual-text alignment
Text-to-Image - Generation quality
Image Helpfulness - Image contribution

🌐 Multi-Scenario Coverage: Extensive support for diverse domains including Agent, text, code, math, and multimodal tasks. 👉 Explore Supported Scenarios
🔄 Holistic Agent Evaluation: Beyond final outcomes, we assess the entire lifecycle—including trajectories, Memory, Reflection, and Tool Use. 👉 Agent Lifecycle Evaluation
✅ Quality Assurance: Every grader comes with benchmark datasets and pytest integration for validation. 👉 View Benchmark Datasets

🛠️ Flexible Grader Building Methods

Choose the build method that fits your requirements:

Customization: Easily extend or modify pre-defined graders to fit your specific needs. 👉 Custom Grader Development Guide
Data-Driven Rubrics: Have a few examples but no clear rules? Use our tools to automatically generate white-box evaluation criteria (Rubrics) based on your data.👉 Automatic Rubric Generation Tutorial
Training Judge Models ( Coming Soon🚀): For high-scale and specialized scenarios, we are developing the capability to train dedicated Judge models. Support for SFT, Bradley-Terry models, and Reinforcement Learning workflows is on the way to help you build high-performance, domain-specific graders.

🔌 Easy Integration (🚧 Coming Soon)

We're actively building seamless connectors for mainstream observability platforms and training frameworks. Stay tuned! → See Integrations

News

2025-12-26 - Released OpenJudge v0.2.0 on PyPI - Major Update! This release expands our core capabilities by adding robust support for diverse evaluation scenarios on top of reward construction. By unifying reward and evaluation signals, OpenJudge v0.2.0 provides a more holistic approach to optimizing application performance and excellence. → migration-guide
2025-10-20 - Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling - We released a new paper on learning generalizable reward criteria for robust modeling.
2025-10-17 - Taming the Judge: Deconflicting AI Feedback for Stable Reinforcement Learning - We introduced techniques to align judge feedback and improve RL stability.
2025-07-09 - Released OpenJudge v0.1.0 on PyPI

📥 Installation

pip install py-openjudge

💡 More installation methods can be found in the Quickstart Guide.

🚀 Quickstart

import asyncio
from openjudge.models import OpenAIChatModel
from openjudge.graders.common.relevance import RelevanceGrader

async def main():
    # 1️⃣ Create model client
    model = OpenAIChatModel(model="qwen3-32b")

    # 2️⃣ Initialize grader
    grader = RelevanceGrader(model=model)

    # 3️⃣ Prepare data
    data = {
        "query": "What is machine learning?",
        "response": "Machine learning is a subset of AI that enables computers to learn from data.",
    }

    # 4️⃣ Evaluate
    result = await grader.aevaluate(**data)

    print(f"Score: {result.score}")   # Score: 5
    print(f"Reason: {result.reason}")

if __name__ == "__main__":
    asyncio.run(main())

📚 Complete Quickstart can be found in the Quickstart Guide.

🔗 Integrations

Seamlessly connect OpenJudge with mainstream observability and training platforms, with more integrations on the way:

Category	Status	Platforms
Observability	🟡 In Progress	LangSmith, LangFuse, Arize Phoenix
Training	🔵 Planned	verl, Trinity-RFT

💬 Have a framework you'd like us to prioritize? Open an Issue!

🤝 Contributing

We love your input! We want to make contributing to OpenJudge as easy and transparent as possible.

🎨 Adding New Graders — Have domain-specific evaluation logic? Share it with the community!
🐛 Reporting Bugs — Found a glitch? Help us fix it by opening an issue
📝 Improving Docs — Clearer explanations or better examples are always welcome
💡 Proposing Features — Have ideas for new integrations? Let's discuss!

📖 See full Contributing Guidelines for coding standards and PR process.

Migration Guide (v0.1.x → v0.2.0)

OpenJudge was previously distributed as the legacy package rm-gallery (v0.1.x). Starting from v0.2.0, it is published as py-openjudge and the Python import namespace is openjudge.

OpenJudge v0.2.0 is NOT backward compatible with v0.1.x.
If you are currently using v0.1.x, choose one of the following paths:

Stay on v0.1.x (legacy): keep using the old package

pip install rm-gallery

We preserved the source code of v0.1.7 (the latest v0.1.x release) in the v0.1.7-legacy branch.

Migrate to v0.2.0 (recommended): follow the Installation section above, then walk through Quickstart (or the full Quickstart Guide) to update your imports / usage.

If you run into migration issues, please open an issue with your minimal repro and current version.

📄 Citation

If you use OpenJudge in your research, please cite:

@software{
  title  = {OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards},
  author = {The OpenJudge Team},
  url    = {https://github.com/modelscope/OpenJudge},
  month  = {07},
  year   = {2025}
}

Made with ❤️ by the OpenJudge Team

⭐ Star Us · 🐛 Report Bug · 💡 Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github/workflows		.github/workflows
cookbooks		cookbooks
docs		docs
openjudge		openjudge
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Holistic Evaluation, Quality Rewards: Driving Application Excellence

📑 Table of Contents

✨ Key Features

📦 Systematic & Quality-Assured Grader Library

🎯 General

🤖 Agent

🖼️ Multimodal

🛠️ Flexible Grader Building Methods

🔌 Easy Integration (🚧 Coming Soon)

News

📥 Installation

🚀 Quickstart

🔗 Integrations

🤝 Contributing

Migration Guide (v0.1.x → v0.2.0)

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 10

Languages

License

modelscope/OpenJudge

Folders and files

Latest commit

History

Repository files navigation

Holistic Evaluation, Quality Rewards: Driving Application Excellence

📑 Table of Contents

✨ Key Features

📦 Systematic & Quality-Assured Grader Library

🎯 General

🤖 Agent

🖼️ Multimodal

🛠️ Flexible Grader Building Methods

🔌 Easy Integration (🚧 Coming Soon)

News

📥 Installation

🚀 Quickstart

🔗 Integrations

🤝 Contributing

Migration Guide (v0.1.x → v0.2.0)

📄 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 10

Languages

Packages