Skip to content

ankitvashisht12/ai_evaluation_framework

Repository files navigation

AI Evaluation Framework

A framework for systematically evaluating AI systems — starting with RAG (Retrieval-Augmented Generation) pipelines, with support for agent evaluations and more.

Modules

RAG Evaluation

Evaluate RAG pipelines using Langsmith SDK with the following configurable stages:

  1. Pre-processing Data (kb aka knowledge base)
  2. Synthetic Data Generation
  3. Chunking Strategy
  4. Embedding model
    • 4.1 Custom Embedding model (for adding vector store or db)
  5. @k parameter aka retrieved documents
  6. Re-ranker (optional)

RAG — Single Evaluation

from rag_evaluation_framework import Evaluation

evaluation = Evaluation(
    langsmith_dataset_name="my-dataset",
    kb_data_path="./knowledge_base"
)

results = evaluation.run(
    chunker=my_chunker,
    embedder=my_embedder,
    vector_store=my_vector_store,  # optional, defaults to Chroma
    k=5,
    reranker=my_reranker  # optional
)

RAG — Hyperparameter Sweep

Run multiple configurations at once and compare results:

from rag_evaluation_framework import Evaluation, SweepConfig
from rag_evaluation_framework.evaluation.chunker import RecursiveCharTextSplitter
from rag_evaluation_framework.evaluation.embedder.openai_embedder import OpenAIEmbedder

evaluation = Evaluation(
    langsmith_dataset_name="my-dataset",
    kb_data_path="./knowledge_base"
)

sweep_results = evaluation.sweep(
    sweep_config=SweepConfig(
        chunkers=[
            RecursiveCharTextSplitter(chunk_size=500, chunk_overlap=50),
            RecursiveCharTextSplitter(chunk_size=1000, chunk_overlap=100),
        ],
        embedders=[
            OpenAIEmbedder(model_name="text-embedding-3-small"),
            OpenAIEmbedder(model_name="text-embedding-3-large"),
        ],
        k_values=[5, 10, 20],
        rerankers=[None],
    )
)

Combinations sharing the same (chunker, embedder) pair reuse the chunked and embedded knowledge base, so you don't pay for redundant embedding API calls.

RAG — Visualization

from rag_evaluation_framework import ComparisonGraph

graph = ComparisonGraph(sweep_results)
graph.bar()           # grouped bar chart
graph.line(x="k")     # line chart varying k
graph.heatmap()       # colour-coded grid

Documentation

See the docs/ folder for detailed guides:

About

AI Evaluation Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors