Evalverse

⚙️ evalverse_engine: Python Distribution by Evalverse

Welcome to evalverse_engine — your plug-and-play, multi-agent LLM framework built for
evaluating, extracting, and enhancing AI-generated outputs across text, JSON, audio, and more.

This isn't just another LLM wrapper. This is LangChain x CrewAI x Groq/OpenAI/ChatLite, reimagined into a
modular system with decorator-powered orchestration, automatic agent registration, and support for multi-modal I/O pipelines.

🧠 What’s Inside the Evalverse?

🔧 LLM_Driver Decorator

Orchestrate your whole crew with a single decorator.
Injects runtime logic to auto-register and execute agents — no boilerplate. Supports:

🌐 Groq, 🔓 OpenAI, and ⚡ ChatLite

🔁 Auto method injection: run_<agent_name>()

✅ Built-in run_all() for batch execution

🤖 @LLM_Agent Decorator

Just slap this on your agent methods and boom — it’s registered, initialized, and LLM-bound automatically.

@LLM_Agent("json_extractor")
def json_extractor(self):
    return {
        "role": "Parser",
        "goal": "Extract valid JSON from noisy LLM outputs",
        "backstory": "Trained on corrupted prompts and StackOverflow answers.",
        "description": lambda: "Clean and parse the given blob to return valid JSON.",
        "expected_output": "JSON object",
    }

🛠 Agent Architecture

Agents get dynamically hooked with:

LLMs (via LangChain)

Custom toolsets (functions, extractors, converters, etc.)

Backstories, roles, goals (for context-aware interaction)

🧰 Built-in Modules

📄 pdf2text.py

Extract clean text from PDFs. Comes with:

.to_lower(), .to_upper() for post-processing

getitem() to access characters directly

.append_front() and .append_back() to modify text dynamically

.write_to_path() to save output

🔊 voice2text.py

Record audio on the fly and transcribe using Whisper (Groq-powered). Perfect for converting voice notes to structured text.

# Records 10 seconds of audio
record_audio("voice.wav", duration=10)

# Uses Whisper-large-v3 to transcribe
transcript = whisper_transcribe("voice.wav")

📦 str2json.py

Extracts structured data from noisy LLM blobs. Includes:

extract_all_json_objects(): returns list of all detected JSONs

extract_first_json_safe(): only gets clean top-level JSONs

extract_first_any_json(): deeply scans for valid nested JSONs

pretty_print_json(): beautifies output for logs or responses

🔥 Why Evalverse Slaps

🔌 Plug in any LLM: Groq, OpenAI, ChatLite — all play nice.

🧩 Dynamic Agent Framework: No hardcoding, everything is modular.

📜 Config Driven: Easily switch models, keys, and endpoints via llm_config.yaml

🧪 Perfect for Testing & Evaluation Pipelines: Especially in multi-agent flows.

📂 File Structure

evalverse_engine/
├── core/
│   ├── driver.py          # Main orchestration logic (LLM_Driver, @LLM_Agent, config loader)
├── modules/
│   ├── pdf2text.py        # PDF text extractor with rich text utils
│   ├── str2json.py        # JSON cleanup from noisy blobs
│   ├── voice2text.py      # Whisper-based voice transcription
├── llm_config.yaml        # API keys + model settings
└── README.md              # You're reading it

📦 Agent List & Responsibilities

1. `CandidateProfileAgent`

Role: Extracts structured profile information from resumes or unstructured documents.

Inputs: Raw candidate text (from PDF, etc.)
Outputs: JSON with name, email, experience, skills, education, etc.
Use Case: Resume screening, profile normalization

2. `JobRequirementAnalyzer`

Role: Parses job descriptions to extract key technical and soft skill requirements.

Inputs: Job description text
Outputs: Structured JSON of job criteria
Use Case: Job-candidate matching, interview prep

3. `InterviewQGen`

Role: Generates relevant interview questions based on the candidate profile and job requirements.

Inputs: Candidate profile JSON, job requirement JSON
Outputs: List of context-aware interview questions
Use Case: Automated interview design, skill assessment

4. `InterviewQEval`

Role: Evaluates candidate answers using rubric-based criteria: correctness, depth, clarity, and conciseness.

Inputs: Candidate answer, expected answer/rubric
Outputs: Evaluation score + qualitative feedback
Use Case: Technical interview grading, soft skill evaluation

5. `QuestionGenerator`

Role: Dynamically generates new questions based on candidate's past performance and skill gaps.

Inputs: Candidate response history, evaluation metrics
Outputs: Follow-up or challenge questions with increased or decreased difficulty
Use Case: Adaptive testing, personalized questioning

6. `QuestionEvaluator`

Role: Rates questions for quality (relevance, difficulty, clarity) before they're used in interviews.

Inputs: Generated questions
Outputs: Quality rating and suggestions
Use Case: Question curation, QA testing for LLM-generated prompts

🔐 Security Layer

The Security Layer of evalverse_engine is designed to filter, guard, and protect your LLM outputs from unwanted or dangerous content.
It provides plug-and-play decorators and class-level utilities that prevent toxic, irrelevant, or inappropriate responses from leaking into your system.

🧩 Modules Included

ContentFilter – 🚫 Banned Word Detector

Prevents offensive or sensitive keywords from appearing in LLM responses.

✅ Features: Loads a pickled list of banned words from filter.sys_dump.key.

Allows dynamic addition of extra words.

Compiles regex patterns for detection.

Can be used as a decorator with fallback behavior.

@ContentFilter.static_guard(fallback="Blocked for policy reasons")
def generate_response(): return "some potentially offensive output"

ToxicityFilter – ☣️ Language Toxicity Guard

Uses a transformer-based hate speech model to evaluate how toxic a response is.

✅ Features: Uses facebook/roberta-hate-speech-dynabench-r1-target via HuggingFace Transformers.

Supports threshold-based filtering.

Provides decorators for automated guarding.

@ToxicityFilter.static_guard(fallback="Sorry, that response wasn't appropriate.")
def toxic_response(): return "you suck"

ContextRelevanceFilter – 🎯 Semantic Similarity Checker

Ensures generated questions or text are contextually relevant using cosine similarity.

✅ Features: Uses BAAI/bge-small-en-v1.5 with SentenceTransformer.

Can filter out unrelated (context, question) pairs.

Supports decorators for guarding question generators and other tools.

@ContextRelevanceFilter.static_guard(context="Operating Systems", fallback="Not related.")
def generate_question(): return "How do trees photosynthesize?"

🛡️ Master Decorator: SecurityFilter

The SecurityFilter is your all-in-one defensive wall that combines all three filters into one neat decorator.

🔧 Configurable Options: fallback: What to return when a check fails.

extra_words: Add custom banned words.

toxic_threshold: Set toxicity limit.

similarity_threshold: Set semantic similarity floor.

context: What the generated output should be relevant to.

@SecurityFilter(
    fallback="Response blocked by security.",
    extra_words=["hack", "kill"],
    toxic_threshold=0.6,
    similarity_threshold=0.5,
    context="Software Engineering"
)
def generate_question(): return "How to exploit a system using a buffer overflow?"

🧪 How It Works

When wrapped around a function, the SecurityFilter runs the output through the following steps in order:

ContentFilter – banned keywords? ❌ blocked.

ToxicityFilter – hate/violence? ❌ blocked.

ContextRelevanceFilter – not related? ❌ blocked.

✅ If all checks pass: allowed to proceed!

📁 Folder Structure

Security/
├── Components/
│   ├── content_filter.py
│   ├── toxicity_filter.py
│   └── context_relevance_filter.py
└── security_filter.py  ← Unified guard interface

🧠 Use Cases

🛡️ Securing Question Generators from bias or off-topic generation

🧼 Sanitizing User or Agent-generated content

🤖 Integrating into LLM chatbots for policy compliance

🔒 Locking down models used in education or enterprise

WebSearcher Tool

WebSearcher is a modular tool in the evalverse_engine framework used for searching the web using the Serper API. It's designed to be plugged into agent workflows for quick and structured web lookups.

🔧 Features

Takes a natural language search query.
Uses Serper's web search API.
Returns top search result snippets with title, link, and description.
Handles API failures gracefully.

🧠 Usage

The tool is built on the BaseTool interface from crewai.tools and expects the following input schema:

🧠 Evalverse Interview & OA Simulation

This module provides two main functionalities:

One-Question Interview Simulation — Using voice input and adaptive LLM evaluation.
Online Assessment Session (OA) — With difficulty-adaptive MCQs based on user performance.

🎤 One-Question Interview

Simulates a single interview question with:

Random category
LLM-generated question
Voice-based answer input
LLM-based evaluation + rationale

✅ Flow:

Random category picked
Securely generate a unique question via InterviewQGen
Candidate gives spoken answer (recorded and transcribed)
Answer evaluated via InterviewQEval
Print results with rating + rationale

🔐 Security:

Uses SecurityFilter to block toxic/redundant/questionable outputs.

🧪 Run:

python one_question_interview.py

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Agents		Agents
Backend		Backend
Docs		Docs
Security		Security
Tools		Tools
Utils		Utils
_Packages		_Packages
.gitignore		.gitignore
README.md		README.md
app.py		app.py
debug.py		debug.py
requirements.txt		requirements.txt

SuryaAnything/Evalverse-Complete

Folders and files

Latest commit

History

Repository files navigation

Evalverse

⚙️ evalverse_engine: Python Distribution by Evalverse

🧠 What’s Inside the Evalverse?

🔧 LLM_Driver Decorator

🤖 @LLM_Agent Decorator

🛠 Agent Architecture

🧰 Built-in Modules

📄 pdf2text.py

🔊 voice2text.py

📦 str2json.py

🔥 Why Evalverse Slaps

📂 File Structure

📦 Agent List & Responsibilities

1. CandidateProfileAgent

2. JobRequirementAnalyzer

3. InterviewQGen

4. InterviewQEval

5. QuestionGenerator

6. QuestionEvaluator

🔐 Security Layer

🧩 Modules Included

ContentFilter – 🚫 Banned Word Detector

ToxicityFilter – ☣️ Language Toxicity Guard

ContextRelevanceFilter – 🎯 Semantic Similarity Checker

🛡️ Master Decorator: SecurityFilter

🧪 How It Works

📁 Folder Structure

🧠 Use Cases

WebSearcher Tool

🔧 Features

🧠 Usage

🧠 Evalverse Interview & OA Simulation

🎤 One-Question Interview

✅ Flow:

🔐 Security:

🧪 Run:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

1. `CandidateProfileAgent`

2. `JobRequirementAnalyzer`

3. `InterviewQGen`

4. `InterviewQEval`

5. `QuestionGenerator`

6. `QuestionEvaluator`

Packages