Skip to content
#

human-evaluation

Here are 17 public repositories matching this topic...

Concept-Guided Chain-of-Thought (CGCoT) pairwise annotation tool for systematic text evaluation using LLMs. Generate breakdowns, compare items, compute scores, and validate against human judgments. Supports Ollama, Hugging Face, Google Gemini, OpenAI, and Anthropic models.

  • Updated Dec 18, 2025
  • Jupyter Notebook

Fine-tunes a T5-small model on the TellMeWhy dataset using context injection from a large language model (Gemini) to improve causal reasoning for “why” questions in narratives. Combines efficient training with human and automated evaluations to assess impact.

  • Updated May 18, 2025
  • Jupyter Notebook

Source code for the data analysis and models accompanying the paper: "Non-Binary Evaluation of Next-Basket Food Recommendation" (published in User Modeling and User-Adapted Interaction, 2024). This repository implements novel non-binary evaluation metrics and recommender models for food recommendation.

  • Updated Oct 28, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the human-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the human-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more