Self-hosted LLM chatbot arena, with yourself as the only judge
-
Updated
Feb 6, 2024 - Python
Self-hosted LLM chatbot arena, with yourself as the only judge
Find informative examples to efficiently (human)-evaluate NLG models.
Code and data for paper "Achieving Reliable Human Assessment of Open-Domain Dialogue Systems"
Official repository for Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents
Multidimensional Evaluation for Text Style Transfer Using ChatGPT. Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer (HumEval 2022)
Code for "QE4PE: Word-level Quality Estimation for Human Post-Editing" ✍️
Success and Failure Linguistic Simplification Annotation 💃
A collection of MTurk templates designed to make complex tasks easier for human annotators.
CS 685 Advanced Natural Language Processing Project: Learning Schematic and Contextual Representations for Text-to-SQL Parsing
Concept-Guided Chain-of-Thought (CGCoT) pairwise annotation tool for systematic text evaluation using LLMs. Generate breakdowns, compare items, compute scores, and validate against human judgments. Supports Ollama, Hugging Face, Google Gemini, OpenAI, and Anthropic models.
Chatbot for IIIT Nagpur using Fine Tuning and RAG
Fine-tunes a T5-small model on the TellMeWhy dataset using context injection from a large language model (Gemini) to improve causal reasoning for “why” questions in narratives. Combines efficient training with human and automated evaluations to assess impact.
ECE1508 applied deep learning group project
Short tutorial on how to conduct a human study using Amazon Mechanical Turk and Google Drive
A web-based evaluation platform for human assessment of LLM-generated Korean SAT reading comprehension passages
Source code for the data analysis and models accompanying the paper: "Non-Binary Evaluation of Next-Basket Food Recommendation" (published in User Modeling and User-Adapted Interaction, 2024). This repository implements novel non-binary evaluation metrics and recommender models for food recommendation.
Add a description, image, and links to the human-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the human-evaluation topic, visit your repo's landing page and select "manage topics."