opustest is an agentic AI system that automatically analyzes Python codebases and produces a detailed quality report. It uses multiple specialized AI agents — built with Microsoft Agent Framework — that collaborate in a pipeline to score your code across four areas and flag every issue found.
The system uses Retrieval-Augmented Generation (RAG): before analyzing your code, it retrieves curated examples of good and bad Python code from an Azure Cosmos DB database, and uses those examples as the quality standards for its review.
You interact with opustest through a web-based UI where you enter a directory path to a Python codebase, watch real-time progress updates, and receive an HTML report.
- Developers who want automated code reviews against consistent standards
- Teams looking to enforce coding standards across projects
- Learners exploring multi-agent AI systems, RAG patterns, and Azure deployment
The generated HTML report contains:
- Scores (0–5) in four areas, plus a total out of 20:
- Static Code Quality and Coding Standards
- Functional Correctness
- Handling of Known Errors
- Handling of Unknown Errors
- Error table listing every issue found, with columns:
- Error Found
- File
- Type of Error
- Explanation (why it was flagged, referencing the RAG database examples)
- Fix Prompt (a ready-to-use prompt for a coding assistant to fix the issue)
The system uses these agents, orchestrated sequentially:
- Code Example Retrieval Agent — Retrieves Python code examples (good/bad) from Azure Cosmos DB via RAG
- Codebase Import Agent — Reads all
.pyfiles from the user-specified directory - Verification Agents (one per scoring area):
- Code Quality and Coding Standards (score 0–5)
- Functional Correctness (score 0–5)
- Handling of Known Errors (score 0–5)
- Handling of Unknown Errors (score 0–5)
- Report Generation Agent — Produces an HTML report with scores and an error table
An Orchestrator coordinates the pipeline and streams progress updates to the web UI via SSE.
flowchart TD
subgraph UI["🌐 Web UI (Browser)"]
A["Enter directory path\n& click Verify"]
end
subgraph Server["⚡ FastAPI Backend"]
B["POST /api/verify\nSSE progress stream"]
end
subgraph Orchestrator["🎯 Orchestrator Agent"]
direction TB
C["Coordinates pipeline\n& streams progress"]
end
subgraph Stage1["📚 Stage 1: RAG Retrieval"]
D["Code Example\nRetrieval Agent"]
E[("Azure Cosmos DB\n(good/bad examples)")]
end
subgraph Stage2["📂 Stage 2: Import"]
F["Codebase\nImport Agent"]
G[("Local .py files")]
end
subgraph Stage3["🔍 Stage 3: Verification"]
H["Code Quality\nAgent"]
I["Functional\nCorrectness Agent"]
J["Known Errors\nAgent"]
K["Unknown Errors\nAgent"]
end
subgraph Stage4["📝 Stage 4: Report"]
L["Report Generation\nAgent"]
end
M["📄 HTML Report\n(scores + error table)"]
A -->|"directory path"| B
B -->|"start pipeline"| C
C --> D
D <-->|"query Python examples"| E
C --> F
F <-->|"read .py files"| G
C --> H & I & J & K
C --> L
L --> M
M -->|"SSE stream"| B
B -->|"progress + report"| A
style UI fill:#e3f2fd,stroke:#1565c0,color:#000
style Server fill:#fff3e0,stroke:#e65100,color:#000
style Orchestrator fill:#f3e5f5,stroke:#6a1b9a,color:#000
style Stage1 fill:#e8f5e9,stroke:#2e7d32,color:#000
style Stage2 fill:#fff8e1,stroke:#f9a825,color:#000
style Stage3 fill:#fce4ec,stroke:#c62828,color:#000
style Stage4 fill:#e0f2f1,stroke:#00695c,color:#000
- You enter a directory path to a Python codebase in the Web UI and click Verify Codebase
- The Orchestrator starts the pipeline and streams progress back to the browser via Server-Sent Events (SSE)
- The Code Example Retrieval Agent queries Cosmos DB for Python examples (good and bad)
- The Codebase Import Agent reads all
.pyfiles from the directory - Four Verification Agents each score one area (0–5) and list issues found
- The Report Generation Agent compiles everything into an HTML report
- The report is displayed in the Web UI
opustest/
├── backend/
│ ├── agents/
│ │ ├── code_example_retrieval.py # RAG retrieval from Cosmos DB
│ │ ├── codebase_import.py # Reads .py files from directory
│ │ ├── orchestrator.py # Coordinates the full pipeline
│ │ ├── report_generation.py # Generates the HTML report
│ │ └── verification/
│ │ ├── code_quality.py # Score: code quality & standards
│ │ ├── functional_correctness.py # Score: functional correctness
│ │ ├── known_errors.py # Score: known error handling
│ │ └── unknown_errors.py # Score: unknown error handling
│ ├── app.py # FastAPI server with SSE
│ ├── config.py # Environment variable loading
│ ├── cosmos_client.py # Cosmos DB query functions
│ └── git_utils.py # Git repo cloning for cloud mode
├── frontend/
│ ├── index.html # Web UI
│ ├── styles.css # Styling
│ └── app.js # SSE client & progress display
├── infra/
│ ├── main.bicep # Azure infra entry point
│ ├── main.parameters.json # azd parameter bindings
│ └── modules/
│ ├── acr.bicep # Azure Container Registry
│ ├── container-app.bicep # Container Apps Environment + App
│ └── cosmos.bicep # Cosmos DB account, database, container
├── scripts/
│ ├── deploy.ps1 # One-command deploy (PowerShell)
│ ├── deploy.sh # One-command deploy (Bash)
│ ├── postprovision.ps1 # azd hook: seeds Cosmos DB after provision (Windows)
│ ├── postprovision.sh # azd hook: seeds Cosmos DB after provision (Linux/macOS)
│ └── seed_cosmos.py # Populate sample code examples
├── azure.yaml # azd project definition
├── Dockerfile # Container image definition
├── requirements.txt # Python dependencies
└── .env.example # Template for environment variables
Each document in the code examples container has:
| Field | Description |
|---|---|
type |
"good" or "bad" |
language |
Programming language (only Python examples used) |
severity |
"low", "medium", or "high" |
description |
Explanation of what is good or bad |
code |
The example code snippet |
There are two ways to run opustest: locally (for development) or deployed to Azure Container Apps (for production). Both paths are covered below.
You will need:
| Tool | Purpose | Install link |
|---|---|---|
| Python 3.10+ | Run the backend and seed script | python.org |
Azure CLI (az) |
Authenticate and manage Azure resources | Install Azure CLI |
Azure Developer CLI (azd) |
Provision infrastructure from Bicep | Install azd |
| Docker | Build container images (cloud deploy only) | Get Docker |
You will also need:
- An Azure OpenAI deployment with the Responses API enabled
- An Azure subscription (for Cosmos DB and Container Apps)
Follow these steps to run opustest on your machine.
Step 1 — Clone and configure environment variables
git clone <your-repo-url>
cd opustest
cp .env.example .envOpen .env in a text editor and fill in your values:
AZURE_AI_PROJECT_ENDPOINT=https://<your-project>.openai.azure.com/
AZURE_AI_MODEL_DEPLOYMENT_NAME=gpt-4o
COSMOS_ENDPOINT=https://<your-account>.documents.azure.com:443/
COSMOS_KEY=<your-cosmos-key>
COSMOS_DATABASE_NAME=code-examples
COSMOS_CONTAINER_NAME=examples
Where do these values come from?
AZURE_AI_PROJECT_ENDPOINTandAZURE_AI_MODEL_DEPLOYMENT_NAMEcome from your Azure OpenAI resource. The Cosmos values come from provisioning (Step 3) or from an existing Cosmos DB account.
Step 2 — Install Python dependencies
pip install --pre -r requirements.txtThe
--preflag is required because Microsoft Agent Framework is currently in preview.
Step 3 — Provision Cosmos DB and seed sample data
# Log in to Azure
az login
azd auth login
# Provision the Cosmos DB account, database, and container
azd provision
# Retrieve the Cosmos DB key and add it to your .env
az cosmosdb keys list \
--name <COSMOS_ACCOUNT_NAME> \
--resource-group <AZURE_RESOURCE_GROUP> \
--query primaryMasterKey -o tsvAfter provisioning, azd makes these outputs available:
| Output | Description |
|---|---|
COSMOS_ENDPOINT |
Cosmos DB account endpoint URL |
COSMOS_ACCOUNT_NAME |
Cosmos DB account name |
COSMOS_DATABASE_NAME |
Database name (code-examples) |
COSMOS_CONTAINER_NAME |
Container name (examples) |
Now populate the database with sample good/bad Python code examples:
python scripts/seed_cosmos.pyThe script creates 19 sample documents:
- Good Python examples (8): PEP 8 naming, type hints, specific exception handling, context managers, input validation, defensive programming with logging
- Bad Python examples (9): bare except, poor naming, missing error handling, SQL injection, mutable defaults, swallowed exceptions, inconsistent return types
- Non-Python examples (2): one JavaScript and one Java entry that the RAG filter correctly ignores
Tip: When you deploy to Azure with
azd up, the seed script runs automatically via thepostprovisionhook — you don't need to run it manually.
Step 4 — Start the server
uvicorn backend.app:app --reloadStep 5 — Use the app
Open http://localhost:8000 in your browser.
Enter the absolute path to a Python codebase directory (e.g. C:\Users\you\my-project or /home/you/my-project) and click Verify Codebase.
You will see real-time progress updates for each stage, and the final HTML report will be displayed when complete.
The recommended way to deploy is with azd up, which provisions all Azure resources, builds the Docker image, pushes it to ACR, deploys to Container Apps, and automatically seeds Cosmos DB with sample data via the postprovision hook.
# Log in to Azure
az login
azd auth login
# Set required environment variables
azd init
azd env set AZURE_AI_PROJECT_ENDPOINT "https://<your-project>.openai.azure.com/"
azd env set AZURE_AI_MODEL_DEPLOYMENT_NAME "gpt-4o"
# Provision, build, and deploy (seeds Cosmos DB automatically)
azd upAfter deployment completes, azd prints the application URL.
Alternatively, you can use the one-command deploy scripts:
.\scripts\deploy.ps1 `
-EnvironmentName codeverify `
-Location eastus `
-AzureAiProjectEndpoint "https://<your-project>.openai.azure.com/"chmod +x scripts/deploy.sh
./scripts/deploy.sh \
--env-name codeverify \
--location eastus \
--ai-endpoint "https://<your-project>.openai.azure.com/"- Initialises an
azdenvironment with your settings - Provisions all Azure resources via Bicep (Cosmos DB, ACR, Container Apps)
- Builds the Docker image and pushes it to ACR
- Updates the Container App with the new image
- Seeds Cosmos DB with sample code examples (pass
-SkipSeed/--skip-seedto skip) - Prints the application URL
docker build -t code-verification:latest .
docker tag code-verification:latest <ACR_LOGIN_SERVER>/code-verification:latest
docker push <ACR_LOGIN_SERVER>/code-verification:latest
az containerapp update --name <APP_NAME> --resource-group <RG_NAME> --image <ACR_LOGIN_SERVER>/code-verification:latestOnce deployed, open the application URL printed by azd (e.g. https://ca-xxxxx.azurecontainerapps.io).
The app defaults to Git URL mode — paste any public HTTPS repository URL and click Verify Codebase:
https://github.com/user/repo
https://github.com/user/repo.git
https://dev.azure.com/org/project/_git/repo
The server clones the repository into a temporary directory (shallow clone, depth 1, 120 s timeout), runs all verification agents against the Python files, and streams progress back to your browser via Server-Sent Events. The cloned directory is cleaned up automatically after the report is generated.
Note: Only HTTPS Git URLs are accepted. SSH URLs (
git@github.com:…) andfile://URLs are rejected for security. Private repositories are supported only if the server's environment has Git credentials configured (e.g. via a Git credential helper).
You can also switch to Local directory mode using the toggle in the UI, which is useful when running locally but is not available for cloud-deployed instances (the container does not have access to your local filesystem).
docker build -t code-verification .
docker run -p 8000:8000 --env-file .env code-verificationEach area is scored from 0 (worst) to 5 (best). The total score is the sum of all four areas (max 20).
Code Quality and Coding Standards
| Score | Meaning |
|---|---|
| 0 | Code fails linting or contains syntax errors |
| 1 | Major formatting issues; inconsistent naming; poor structure |
| 2 | Passes basic linting but contains frequent style violations |
| 3 | Mostly compliant with coding standards; minor issues |
| 4 | Fully compliant with coding standards; clean and consistent structure |
| 5 | Fully compliant, idiomatic, and optimized for readability |
Functional Correctness
| Score | Meaning |
|---|---|
| 0 | Core functionality is broken or produces incorrect results |
| 1 | Major features malfunction; incorrect behavior is common |
| 2 | Basic functionality works, but edge cases frequently fail |
| 3 | Core functionality works as intended; minor bugs exist |
| 4 | Functionality is correct across typical and edge cases |
| 5 | Functionality is fully correct and robust across all expected scenarios |
Handling of Known Errors
| Score | Meaning |
|---|---|
| 0 | Known error conditions are not handled and cause crashes or undefined behavior |
| 1 | Minimal error handling; many known errors propagate unhandled |
| 2 | Some known errors are handled, but coverage is inconsistent |
| 3 | Most known error cases are handled with reasonable safeguards |
| 4 | All known error conditions are explicitly handled with clear recovery or messaging |
| 5 | Known errors are comprehensively handled with graceful recovery and clear diagnostics |
Handling of Unknown Errors
| Score | Meaning |
|---|---|
| 0 | Unexpected errors cause crashes, data corruption, or undefined behavior |
| 1 | Global error handling exists but provides little protection or visibility |
| 2 | Some safeguards exist, but unexpected failures are not consistently contained |
| 3 | Unexpected errors are generally contained and logged without crashing the system |
| 4 | Robust fallback mechanisms prevent most unknown errors from causing failures |
| 5 | System is resilient to unknown errors through defensive programming and comprehensive logging |
Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.
This project is licensed under the MIT License.







