human-evaluation

Multidimensional Evaluation for Text Style Transfer Using ChatGPT. Human Judgement as a Compass to Navigate Automatic Metrics for Formality Transfer (HumEval 2022)

text-style-transfer formality-style-transfer human-evaluation automatic-evaluation

Updated Apr 27, 2023
Python

gsarti / qe4pe

Star

Code for "QE4PE: Word-level Quality Estimation for Human Post-Editing" ✍️

machine-translation italian dutch quality-estimation post-editing machine-translation-metrics human-evaluation machine-translation-evaluation word-level-quality-estimation unbabel-comet behavioral-logs

Updated Mar 6, 2025
Jupyter Notebook

davidheineman / salsa

Star

Success and Failure Linguistic Simplification Annotation 💃

nlp text-simplification human-evaluation automatic-evaluation thresh

Updated May 11, 2024
Python

ahwang16 / mturk-templates

Star

A collection of MTurk templates designed to make complex tasks easier for human annotators.

nlp mturk human-annotation human-evaluation

Updated Nov 11, 2023
HTML

PrachiJainxD / text-to-sql

Star

CS 685 Advanced Natural Language Processing Project: Learning Schematic and Contextual Representations for Text-to-SQL Parsing

codex zero-shot-learning grappa few-shot-learning bert-model human-evaluation roberta-model prompt-engineering rat-sql generationaugmented-pre-training

Updated May 20, 2023
Jupyter Notebook

Concept-Guided Chain-of-Thought (CGCoT) pairwise annotation tool for systematic text evaluation using LLMs. Generate breakdowns, compare items, compute scores, and validate against human judgments. Supports Ollama, Hugging Face, Google Gemini, OpenAI, and Anthropic models.

nlp data-science machine-learning annotation text-analysis pairwise-comparison bradley-terry human-evaluation llm chain-of-thought alt-test

Updated Dec 18, 2025
Jupyter Notebook

Gyanbardhan / Disha_Chatbot_IIITN

Star

Chatbot for IIIT Nagpur using Fine Tuning and RAG

lora rouge-metric similarity-score bleu-score similarity-search pinecone fine-tuning rag huggingface streamlit human-evaluation llm huggingface-spaces qlora unsloth phi-3-mini llama-3-2-1b

Updated Dec 7, 2024
Jupyter Notebook

dane-meister / TellMeWhy-Context-Injection

Star

Fine-tunes a T5-small model on the TellMeWhy dataset using context injection from a large language model (Gemini) to improve causal reasoning for “why” questions in narratives. Combines efficient training with human and automated evaluations to assess impact.

nlp transformers gemini question-answering language-model fine-tuning huggingface t5 context-injection commensense human-evaluation ai-evaluation small-models bleurt t5-small

Updated May 18, 2025
Jupyter Notebook

Shihao-Piao / ece1508_project

Star

ECE1508 applied deep learning group project

deep-learning nst cyclegan human-evaluation

Updated Apr 21, 2024
Jupyter Notebook

suryabulusu / how-to-human-eval

Star

Short tutorial on how to conduct a human study using Amazon Mechanical Turk and Google Drive

amt pydrive human-evaluation

Updated Mar 2, 2022
HTML

ksat-mock / koreansat

Star

A web-based evaluation platform for human assessment of LLM-generated Korean SAT reading comprehension passages

human-evaluation llm

Updated Apr 17, 2025
Python

aekpalakorn / Non-binary-Evaluation-Food-Recommendation

Star

Source code for the data analysis and models accompanying the paper: "Non-Binary Evaluation of Next-Basket Food Recommendation" (published in User Modeling and User-Adapted Interaction, 2024). This repository implements novel non-binary evaluation metrics and recommender models for food recommendation.

recommender-systems evaluation-metrics food-recommendation food-diary sequential-recommendation next-basket-recommendation human-evaluation academic-research food-journal non-binary-evaluation umuai

Updated Oct 28, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the human-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the human-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

human-evaluation

Here are 17 public repositories matching this topic...

Contextualist / lone-arena

yistLin / human-evaluation

zouharvi / subset2evaluate

TianboJi / Dialogue-Eval

cxcscmu / Deep-Research-Comparator

laihuiyuan / eval-formality-transfer

gsarti / qe4pe

davidheineman / salsa

ahwang16 / mturk-templates

PrachiJainxD / text-to-sql

mlchrzan / pairadigm

Gyanbardhan / Disha_Chatbot_IIITN

dane-meister / TellMeWhy-Context-Injection

Shihao-Piao / ece1508_project

suryabulusu / how-to-human-eval

ksat-mock / koreansat

aekpalakorn / Non-binary-Evaluation-Food-Recommendation

Improve this page

Add this topic to your repo