Skip to content

A modular LLM-powered agent framework for conducting smart interviews and adaptive MCQ assessments — voice, text, JSON, filters, all packed into one badass pipeline.

Notifications You must be signed in to change notification settings

SuryaAnything/Evalverse-Complete

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evalverse

⚙️ evalverse_engine: Python Distribution by Evalverse

Welcome to evalverse_engine — your plug-and-play, multi-agent LLM framework built for
evaluating, extracting, and enhancing AI-generated outputs across text, JSON, audio, and more.

This isn't just another LLM wrapper. This is LangChain x CrewAI x Groq/OpenAI/ChatLite, reimagined into a
modular system with decorator-powered orchestration, automatic agent registration, and support for multi-modal I/O pipelines.

🧠 What’s Inside the Evalverse?

🔧 LLM_Driver Decorator

Orchestrate your whole crew with a single decorator.
Injects runtime logic to auto-register and execute agents — no boilerplate. Supports:

🌐 Groq, 🔓 OpenAI, and ⚡ ChatLite

🔁 Auto method injection: run_<agent_name>()

✅ Built-in run_all() for batch execution

🤖 @LLM_Agent Decorator

Just slap this on your agent methods and boom — it’s registered, initialized, and LLM-bound automatically.

@LLM_Agent("json_extractor")
def json_extractor(self):
    return {
        "role": "Parser",
        "goal": "Extract valid JSON from noisy LLM outputs",
        "backstory": "Trained on corrupted prompts and StackOverflow answers.",
        "description": lambda: "Clean and parse the given blob to return valid JSON.",
        "expected_output": "JSON object",
    }

🛠 Agent Architecture

Agents get dynamically hooked with:

LLMs (via LangChain)

Custom toolsets (functions, extractors, converters, etc.)

Backstories, roles, goals (for context-aware interaction)

🧰 Built-in Modules

📄 pdf2text.py

Extract clean text from PDFs. Comes with:

.to_lower(), .to_upper() for post-processing

getitem() to access characters directly

.append_front() and .append_back() to modify text dynamically

.write_to_path() to save output

🔊 voice2text.py

Record audio on the fly and transcribe using Whisper (Groq-powered). Perfect for converting voice notes to structured text.

# Records 10 seconds of audio
record_audio("voice.wav", duration=10)

# Uses Whisper-large-v3 to transcribe
transcript = whisper_transcribe("voice.wav")

📦 str2json.py

Extracts structured data from noisy LLM blobs. Includes:

extract_all_json_objects(): returns list of all detected JSONs

extract_first_json_safe(): only gets clean top-level JSONs

extract_first_any_json(): deeply scans for valid nested JSONs

pretty_print_json(): beautifies output for logs or responses

🔥 Why Evalverse Slaps

🔌 Plug in any LLM: Groq, OpenAI, ChatLite — all play nice.

🧩 Dynamic Agent Framework: No hardcoding, everything is modular.

📜 Config Driven: Easily switch models, keys, and endpoints via llm_config.yaml

🧪 Perfect for Testing & Evaluation Pipelines: Especially in multi-agent flows.

📂 File Structure

evalverse_engine/
├── core/
│   ├── driver.py          # Main orchestration logic (LLM_Driver, @LLM_Agent, config loader)
├── modules/
│   ├── pdf2text.py        # PDF text extractor with rich text utils
│   ├── str2json.py        # JSON cleanup from noisy blobs
│   ├── voice2text.py      # Whisper-based voice transcription
├── llm_config.yaml        # API keys + model settings
└── README.md              # You're reading it

📦 Agent List & Responsibilities

1. CandidateProfileAgent

Role: Extracts structured profile information from resumes or unstructured documents.

  • Inputs: Raw candidate text (from PDF, etc.)
  • Outputs: JSON with name, email, experience, skills, education, etc.
  • Use Case: Resume screening, profile normalization

2. JobRequirementAnalyzer

Role: Parses job descriptions to extract key technical and soft skill requirements.

  • Inputs: Job description text
  • Outputs: Structured JSON of job criteria
  • Use Case: Job-candidate matching, interview prep

3. InterviewQGen

Role: Generates relevant interview questions based on the candidate profile and job requirements.

  • Inputs: Candidate profile JSON, job requirement JSON
  • Outputs: List of context-aware interview questions
  • Use Case: Automated interview design, skill assessment

4. InterviewQEval

Role: Evaluates candidate answers using rubric-based criteria: correctness, depth, clarity, and conciseness.

  • Inputs: Candidate answer, expected answer/rubric
  • Outputs: Evaluation score + qualitative feedback
  • Use Case: Technical interview grading, soft skill evaluation

5. QuestionGenerator

Role: Dynamically generates new questions based on candidate's past performance and skill gaps.

  • Inputs: Candidate response history, evaluation metrics
  • Outputs: Follow-up or challenge questions with increased or decreased difficulty
  • Use Case: Adaptive testing, personalized questioning

6. QuestionEvaluator

Role: Rates questions for quality (relevance, difficulty, clarity) before they're used in interviews.

  • Inputs: Generated questions
  • Outputs: Quality rating and suggestions
  • Use Case: Question curation, QA testing for LLM-generated prompts

🔐 Security Layer

The Security Layer of evalverse_engine is designed to filter, guard, and protect your LLM outputs from unwanted or dangerous content.
It provides plug-and-play decorators and class-level utilities that prevent toxic, irrelevant, or inappropriate responses from leaking into your system.

🧩 Modules Included

ContentFilter – 🚫 Banned Word Detector

Prevents offensive or sensitive keywords from appearing in LLM responses.

✅ Features: Loads a pickled list of banned words from filter.sys_dump.key.

Allows dynamic addition of extra words.

Compiles regex patterns for detection.

Can be used as a decorator with fallback behavior.

@ContentFilter.static_guard(fallback="Blocked for policy reasons")
def generate_response(): return "some potentially offensive output"

ToxicityFilter – ☣️ Language Toxicity Guard

Uses a transformer-based hate speech model to evaluate how toxic a response is.

✅ Features: Uses facebook/roberta-hate-speech-dynabench-r1-target via HuggingFace Transformers.

Supports threshold-based filtering.

Provides decorators for automated guarding.

@ToxicityFilter.static_guard(fallback="Sorry, that response wasn't appropriate.")
def toxic_response(): return "you suck"

ContextRelevanceFilter – 🎯 Semantic Similarity Checker

Ensures generated questions or text are contextually relevant using cosine similarity.

✅ Features: Uses BAAI/bge-small-en-v1.5 with SentenceTransformer.

Can filter out unrelated (context, question) pairs.

Supports decorators for guarding question generators and other tools.

@ContextRelevanceFilter.static_guard(context="Operating Systems", fallback="Not related.")
def generate_question(): return "How do trees photosynthesize?"

🛡️ Master Decorator: SecurityFilter

The SecurityFilter is your all-in-one defensive wall that combines all three filters into one neat decorator.

🔧 Configurable Options: fallback: What to return when a check fails.

extra_words: Add custom banned words.

toxic_threshold: Set toxicity limit.

similarity_threshold: Set semantic similarity floor.

context: What the generated output should be relevant to.

@SecurityFilter(
    fallback="Response blocked by security.",
    extra_words=["hack", "kill"],
    toxic_threshold=0.6,
    similarity_threshold=0.5,
    context="Software Engineering"
)
def generate_question(): return "How to exploit a system using a buffer overflow?"

🧪 How It Works

When wrapped around a function, the SecurityFilter runs the output through the following steps in order:

ContentFilter – banned keywords? ❌ blocked.

ToxicityFilter – hate/violence? ❌ blocked.

ContextRelevanceFilter – not related? ❌ blocked.

✅ If all checks pass: allowed to proceed!

📁 Folder Structure

Security/
├── Components/
│   ├── content_filter.py
│   ├── toxicity_filter.py
│   └── context_relevance_filter.py
└── security_filter.py  ← Unified guard interface

🧠 Use Cases

🛡️ Securing Question Generators from bias or off-topic generation

🧼 Sanitizing User or Agent-generated content

🤖 Integrating into LLM chatbots for policy compliance

🔒 Locking down models used in education or enterprise

WebSearcher Tool

WebSearcher is a modular tool in the evalverse_engine framework used for searching the web using the Serper API. It's designed to be plugged into agent workflows for quick and structured web lookups.

🔧 Features

  • Takes a natural language search query.
  • Uses Serper's web search API.
  • Returns top search result snippets with title, link, and description.
  • Handles API failures gracefully.

🧠 Usage

The tool is built on the BaseTool interface from crewai.tools and expects the following input schema:

🧠 Evalverse Interview & OA Simulation

This module provides two main functionalities:

  1. One-Question Interview Simulation — Using voice input and adaptive LLM evaluation.
  2. Online Assessment Session (OA) — With difficulty-adaptive MCQs based on user performance.

🎤 One-Question Interview

Simulates a single interview question with:

  • Random category
  • LLM-generated question
  • Voice-based answer input
  • LLM-based evaluation + rationale

✅ Flow:

  1. Random category picked
  2. Securely generate a unique question via InterviewQGen
  3. Candidate gives spoken answer (recorded and transcribed)
  4. Answer evaluated via InterviewQEval
  5. Print results with rating + rationale

🔐 Security:

  • Uses SecurityFilter to block toxic/redundant/questionable outputs.

🧪 Run:

python one_question_interview.py

About

A modular LLM-powered agent framework for conducting smart interviews and adaptive MCQ assessments — voice, text, JSON, filters, all packed into one badass pipeline.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages