Multi-Agent Code Review Framework

A project implementing a multi-agent system for automated code review using CrewAI.

Quick Start

# Install dependencies
poetry install

# Configure environment variables
cp .env.example .env
# Edit .env and add your API keys:
#   - LLM_PROVIDER (openai or anthropic)
#   - OPENAI_API_KEY (required if LLM_PROVIDER=openai)
#   - ANTHROPIC_API_KEY (required if LLM_PROVIDER=anthropic)
#   - GITHUB_TOKEN (required for dataset collection)

# Run a review (local path)
poetry run python -m app.cli review \
  --pr-id "123" \
  --title "Your PR Title" \
  --language python \
  /path/to/repo

# Or use GitHub URL directly (title/description auto-fetched)
poetry run python -m app.cli review \
  --pr-id "14468" \
  --language python \
  "https://github.com/fastapi/fastapi"

# Supported languages: python, javascript, typescript, java, go, rust, cpp, csharp, ruby, php

Features

🤖 Multi-Agent System: 7 specialized agents (context, security, style, logic, performance, docs, tests)
🔍 Evidence-Based: All findings require tool output or code references
📊 Evaluation Framework: Statistical analysis and LaTeX export
⚡ Tool Integration: Git, Ruff (Python), ESLint (JS/TS), Semgrep, Bandit, Coverage.py
🎯 Actionable: Auto-patches for simple fixes, detailed guidance for complex issues
💰 Cost Tracking: Real-time token usage and cost estimation for OpenAI and Anthropic
🌐 Multi-Provider: Support for both OpenAI and Anthropic LLMs

System Architecture

┌─────────────┐
│   CLI       │  poetry run python -m app.cli review ...
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ ReviewFlow  │  Orchestrates the entire process
└──────┬──────┘
       │
       ├─► 1️⃣ Context Builder (Git diff + Tools)
       │
       ├─► 2️⃣ Analysis Agents (Parallel)
       │    ├─ ChangeContextAnalyst (LLM)
       │    ├─ SecurityReviewer (Tool)
       │    ├─ StyleFormatReviewer (Tool)
       │    ├─ LogicBugReviewer (LLM)
       │    ├─ PerformanceReviewer (LLM)
       │    ├─ DocumentationReviewer (LLM)
       │    └─ TestCoverageReviewer (Hybrid)
       │
       ├─► 3️⃣ RevisionProposer (Patch generation)
       │
       ├─► 4️⃣ Supervisor (Consolidation)
       │
       └─► 5️⃣ PRReviewResult (Final output)

System Flow

Phase 1: Context Building

Extract git diff between PR branch and base branch
Run language-specific tools (automatically selected based on --language parameter):
- Python: Ruff (linting), Bandit (security)
- JavaScript/TypeScript: ESLint (linting)
- All languages: Semgrep (security, language-agnostic)
Build PRContext with all information

Phase 2: Analysis Agents

7 specialized agents analyze the PR in parallel:

ChangeContextAnalyst: Checks PR title/description consistency
SecurityReviewer: Finds security vulnerabilities
StyleFormatReviewer: Detects style/formatting issues
LogicBugReviewer: Identifies logical errors
PerformanceReviewer: Finds performance bottlenecks
DocumentationReviewer: Checks documentation quality
TestCoverageReviewer: Analyzes test coverage

Phase 3: Revision Proposer

Generates patches for findings that need fixes.

Phase 4: Supervisor

Consolidates all findings
Removes duplicates
Prioritizes by severity
Applies nit limits

Phase 5: Result Synthesis

Creates final PRReviewResult with:

Findings grouped by severity
Markdown review comment
JSON output for evaluation
Metrics (time, cost, token usage)
Real-time cost estimation based on provider and model

Project Structure

.
├── agents/              # Agent implementations
│   ├── base.py         # Base agent class
│   ├── change_context_analyst.py
│   ├── security_reviewer.py
│   ├── style_reviewer.py
│   ├── logic_reviewer.py
│   ├── performance_reviewer.py
│   ├── documentation_reviewer.py
│   ├── test_reviewer.py
│   ├── revision_proposer.py
│   └── supervisor.py
├── domain/             # Domain models (Pydantic)
│   ├── models.py       # PRMetadata, Finding, Language enum, LLMProvider enum
│   └── __init__.py
├── tools/              # Analysis tool integrations
│   ├── base.py         # Tool base class
│   ├── git_diff.py
│   ├── linters.py      # Ruff, ESLint
│   ├── security.py     # Semgrep, Bandit
│   └── coverage.py
├── flows/              # Orchestration
│   ├── context_builder.py
│   └── review_flow.py
├── eval/               # Evaluation framework
│   ├── metrics/
│   └── dataset/
├── app/                # Application layer
│   ├── cli.py          # CLI interface
│   ├── config.py       # Settings
│   └── logging.py      # Structured logging
├── prompts/            # Versioned prompts
│   ├── cca/
│   ├── security/
│   ├── style/
│   └── ...
└── reviews/            # Review results storage

Configuration

Key settings in .env:

# LLM Provider Selection
LLM_PROVIDER=anthropic  # or "openai"

# OpenAI Configuration (if LLM_PROVIDER=openai)
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4-turbo-preview
OPENAI_TEMPERATURE=0.0
OPENAI_SEED=42

# Anthropic Configuration (if LLM_PROVIDER=anthropic)
# Recommended: claude-3-5-haiku-20241022 (best price-performance)
# Alternatives: claude-3-5-sonnet-20241022 (balanced), claude-3-opus-20240229 (highest quality)
ANTHROPIC_API_KEY=sk-ant-api03-...
ANTHROPIC_MODEL=claude-3-5-haiku-20241022

# GitHub (required for dataset collection and PR fetching)
GITHUB_TOKEN=ghp_...

# Review Configuration
MAX_NITS_PER_REVIEW=5
MAX_PATCH_LINES=10
ENABLE_PARALLEL_AGENTS=true

# Evaluation
EVAL_DATASET_PATH=./eval/dataset
EVAL_RESULTS_PATH=./eval/results
SEED_FOR_EXPERIMENTS=42

LLM Provider Selection

The framework supports both OpenAI and Anthropic LLM providers:

OpenAI: GPT-4 Turbo, GPT-4, GPT-3.5 Turbo
Anthropic:
- Claude 3.5 Haiku (recommended): Best price-performance ratio ($0.80-1.00/1M input, $4-5/1M output)
- Claude 3.5 Sonnet: Balanced performance ($3/1M input, $15/1M output)
- Claude 3 Opus: Highest quality ($15/1M input, $75/1M output)

Set LLM_PROVIDER=anthropic or LLM_PROVIDER=openai in your .env file.

See .env.example for all available configuration options.

Dataset Collection

Collect real PRs from GitHub for evaluation:

# Configure GitHub token in .env
# GITHUB_TOKEN=ghp_your_token_here

# Collect balanced dataset
poetry run python eval/dataset/collect_dataset.py collect \
  --repos 5 \
  --prs-per-repo 5 \
  --balanced

See eval/dataset/README.md for detailed instructions.

Evaluation

Run evaluation on collected dataset:

# Evaluate using stored reviews (recommended)
poetry run python -m app.cli evaluate \
  --system multi_agent \
  --use-stored

# Evaluate specific PRs
poetry run python -m app.cli evaluate \
  --system multi_agent \
  --pr-ids "14468,2779" \
  --use-stored

# Re-run reviews and evaluate
poetry run python -m app.cli evaluate \
  --system single_agent \
  --rerun \
  --repo-path /path/to/repo

# Compare systems
poetry run python -m app.cli compare \
  ./eval/results/evaluation_single_agent.json \
  ./eval/results/evaluation_multi_agent.json \
  --latex results.tex

Research Goals

Evaluate whether multi-agent code review with tool integration achieves:

Higher actionability (more patches/clear fixes)
Lower noise (fewer false positives)
Better coverage (detect more critical issues)

Compared to single-agent LLM baselines.

Design Principles

SOLID: Single responsibility, dependency injection, clear abstractions
DRY: Shared base classes, reusable components
Evidence-Based: Every finding must cite tool output or code reference
Reproducible: Deterministic settings, versioned prompts, pinned tools
Type-Safe: Enum-based language and provider selection
Cost-Aware: Real-time token tracking and cost estimation

Development

# Run tests
poetry run pytest

# Lint
poetry run ruff check .

# Format
poetry run ruff format .

Contributing

See CONTRIBUTING.md for contribution guidelines.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Agent Code Review Framework

Quick Start

Features

System Architecture

System Flow

Phase 1: Context Building

Phase 2: Analysis Agents

Phase 3: Revision Proposer

Phase 4: Supervisor

Phase 5: Result Synthesis

Project Structure

Configuration

LLM Provider Selection

Dataset Collection

Evaluation

Research Goals

Design Principles

Development

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
agents		agents
app		app
domain		domain
eval		eval
flows		flows
prompts		prompts
reviews		reviews
scripts		scripts
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini.py		pytest.ini.py

redrussianarmy/code-review-agentic-framework

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Code Review Framework

Quick Start

Features

System Architecture

System Flow

Phase 1: Context Building

Phase 2: Analysis Agents

Phase 3: Revision Proposer

Phase 4: Supervisor

Phase 5: Result Synthesis

Project Structure

Configuration

LLM Provider Selection

Dataset Collection

Evaluation

Research Goals

Design Principles

Development

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages