Skip to content

a library of works related to Large Language Models (LLMs) based Agent Hallucination

Notifications You must be signed in to change notification settings

ASCII-LAB/Awesome-Agent-Hallucinations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 

Repository files navigation

Awesome-Agent-Hallucinations

Welcome to the Awesome-Agent-Hallucinations repository! This repository is a library of works related to Large Language Models (LLMs) based Agent Hallucinations. For a detailed introduction, please refer our survey paper:

LLM-based Agents Suffer from Hallucinations: A Survey of Taxonomy, Methods, and Directions

🤩An overview of agent goal completion.
  • Within the loop, the LLM-based agent carries out external behaviors including reasoning, execution, perception, and memorization, guided by its internal belief state.
  • Throughout this process, the environment dynamically evolves in response to the agent’s decisions, while task allocation within the LLM-based multi-agent system further enhances the fulfillment of user requirements.

🌟1 Reasoning Hallucinations

  • LLM+Map: Bimanual Robot Task Planning Using Large Language Models And Planning Domain Definition Language arXiv Code

  • Large language models for planning: A comprehensive and systematic survey arXiv

  • Plan-then-execute: An empirical study of user trust and team performance when using llm agents as a daily assistant CHI

  • Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents CSUR Code

  • Divide and conquer in multi-agent planning AAAI

  • Hm-rag: Hierarchical multi-agent multimodal retrieval augmented generation arxiv Code

  • AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenge arxiv

  • REMAC: Self-Reflective and Self-Evolving Multi-Agent Collaboration for Long-Horizon Robot Manipulation arxiv Code

  • MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems arxiv Code

  • Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph ICRA Code

  • Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions ICLR

  • Why Does ChatGPT Fall Short in Providing Truthful Answers? ICBINB

  • Do LLMs Understand Ambiguity in Text? A Case Study in Open-world Question Answering IEEE BigData

  • We‘re Afraid Language Models Aren‘t Modeling Ambiguity EMNLP Code

  • Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training ICLR Code

  • Goal Awareness for Conversational AI: Proactivity, Non-collaborativity, and Beyond ACL

  • Active Task Disambiguation with LLMs ICLR

  • Conversational health agents: A personalized llm-powered agent framework JAMIA Code

  • Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation CVPR Code

  • Navigating the overkill in large language models ACL

  • Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting NAACL

  • AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents IJCAI Code

  • Agents of Change: Self-Evolving LLM Agents for Strategic Planning

  • Know Where You’re Uncertain When Planning with Multimodal Foundation Models: A Formal Framework

  • Large language models for planning: A comprehensive and systematic survey

  • BeamAggR: Beam Aggregation Reasoning over Multi-source Knowledge for Multi-hop Question Answering

  • Rag-kg-il: A multi-agent hybrid framework for reducing hallucinations and enhancing llm reasoning through rag and incremental knowledge graph learning integration

  • Large language models as commonsense knowledge for large-scale task planning

  • Knowagent: Knowledge-augmented planning for llm-based agents

  • Text-image alignment for diffusion-based perception CVPR

  • Vision-language model-based physical reasoning for robot liquid perception IROS

  • VECSR: Virtually Embodied Common Sense Reasoning System

  • Web agents with world models: Learning and leveraging environment dynamics in web navigation

  • RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects

  • Large language models as commonsense knowledge for large-scale task planning NeurIPS

  • Focus on your question! interpreting and mitigating toxic cot problems in commonsense reasoning

  • Knowledge Verification to Nip Hallucination in the Bud EMNLP

  • Neural-symbolic methods for knowledge graph reasoning: A survey ACM

  • Contrastive chain-of-thought prompting

  • Can llm be a good path planner based on prompt engineering? mitigating the hallucination for path planning

  • Insight-v: Exploring long-chain visual reasoning with multimodal large language models

  • Do i know this entity? knowledge awareness and hallucinations in language models

  • KnowMap: Efficient Knowledge-Driven Task Adaptation for LLMs

  • Robotouille: An Asynchronous Planning Benchmark for LLM Agents

  • Review of Case-Based Reasoning for LLM Agents: Theoretical Foundations, Architectural Components, and Cognitive Integration

  • Uncertainty of thoughts: Uncertainty-aware planning enhances information seeking in large language models

  • Language models, agent models, and world models: The law for machine reasoning and planning

  • Towards reasoning in large language models: A survey

  • Agent Capability Negotiation and Binding Protocol (ACNBP)

  • Enhance Mobile Agents Thinking Process Via Iterative Preference Learning

  • TARGA: Targeted Synthetic Data Generation for Practical Reasoning over Structured Data

  • Attentive Reasoning Queries: A Systematic Method for Optimizing Instruction-Following in Large Language Models

  • Lisa: Reasoning segmentation via large language model

  • Focus on your question! interpreting and mitigating toxic cot problems in commonsense reasoning

  • WorldEval: World Model as Real-World Robot Policies Evaluator

  • Introspective Planning: Aligning Robots’ Uncertainty with Inherent Task Ambiguity

  • We‘re Afraid Language Models Aren‘t Modeling Ambiguity

  • A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

  • The flan collection: Designing data and methods for effective instruction tuning

  • Mars-PO: Multi-Agent Reasoning System Preference Optimization

  • Reasoning on graphs: Faithful and interpretable large language model reasoning

🌟2 Execution Hallucinations

  • Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks ACL models
  • APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls CoRR
  • Toolace: Winning the points of llm function calling arxiv models datasets
  • ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers NeurIPS Code
  • ACEBench: Who Wins the Match Point in Tool Learning?
  • ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
  • Learning Evolving Tools for Large Language Models
  • Tool Unlearning for Tool-Augmented LLMs
  • Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists
  • ToolRegistry: A Protocol-Agnostic Tool Management Library for Function-Calling LLMs
  • Retool: Reinforcement learning for strategic tool use in llms
  • Geckopt: Llm system efficiency via intent-based tool selection
  • Rag-mcp: Mitigating prompt bloat in llm tool selection via retrieval-augmented generation
  • Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum
  • CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
  • RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing
  • NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models
  • GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation
  • Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
  • An llm compiler for parallel function calling
  • ControlLLM: Augment Language Models with Tools by Searching on Graphs
  • OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
  • Graph RAG-Tool Fusion
  • ScaleMCP: Dynamic and Auto-Synchronizing Model Context Protocol Tools for LLM Agents

🌟3 Perception Hallucinations

  • Comparative Analysis of AI Agent Architectures for Entity Relationship Classification
  • Ferret: Refer and ground anything anywhere at any granularity
  • Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
  • Re-Aligning Language to Visual Objects with an Agentic Workflow
  • Mitigating hallucination in visual language models with visual supervision
  • Multi-news+: Cost-efficient dataset cleansing via llm-based data annotation
  • Editing Factual Knowledge in Language Models
  • Seeing is believing: Mitigating hallucination in large vision-language models via clip-guided decoding
  • EMMA: Efficient Visual Alignment in Multi-Modal LLMs
  • Asynchronous LLM Function Calling
  • HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
  • Detecting and preventing hallucinations in large vision language models
  • ContextualLVLM-Agent: A Holistic Framework for Multi-Turn Visually-Grounded Dialogue and Complex Instruction Following
  • G-retriever: Retrieval-augmented generation for textual graph understanding and question answering
  • Causal-LLaVA: Causal Disentanglement for Mitigating Hallucination in Multimodal Large Language Models
  • Vcoder: Versatile vision encoders for multimodal large language models
  • Hydra: An Agentic Reasoning Approach for Enhancing Adversarial Robustness and Mitigating Hallucinations in Vision-Language Models
  • Text-image alignment for diffusion-based perception
  • Vision-language model-based physical reasoning for robot liquid perception
  • Large language model agent for fake news detection
  • Beyond Bare Queries: Open-Vocabulary Object Grounding with 3D Scene Graph

🌟4 Memorization Hallucinations

  • Knowledge unlearning for llms: Tasks, methods, and challenges CoRR
  • MIRIX: Multi-Agent Memory System for LLM-Based Agents
  • What You Want to Forget: Efficient Unlearning for LLMs
  • Improving factuality with explicit working memory
  • Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations
  • Flashattention: Fast and memory-efficient exact attention with io-awareness
  • Who’s harry potter? approximate unlearning for LLMs
  • Alphaedit: Null-space constrained model editing for language models
  • Does fine-tuning LLMs on new knowledge encourage hallucinations?
  • Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
  • Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
  • Memory sandbox: Transparent and interactive memory management for conversational agents
  • Editing models with task arithmetic
  • AIstorian lets AI be a historian: A KG-powered multi-agent system for accurate biography generation
  • PMET: Precise Model Editing in a Transformer
  • Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
  • Memos:A memory os for ai system
  • Unveiling the pitfalls of knowledge editing for large language models

🌟5 Communication Hallucinations

  • Autonomous Agents for Collaborative Task under Information Asymmetry NeurIPS Code
  • LLMs Working in Harmony: A Survey on the Technological Aspects of Building Effective LLM-Based Multi Agent Systems arxiv
  • Cautiously-optimistic knowledge sharing for cooperative multi-agent reinforcement learning AAAI
  • Challenges in Human-Agent Communication arxiv
  • Preventing Rogue Agents Improves Multi-Agent Collaboration
  • Why Do Multi-Agent LLM Systems Fail?
  • Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
  • MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents
  • AgentDropout: Dynamic Agent Elimination for Token-Efficient and High-Performance LLM-Based Multi-Agent Collaboration
  • Chateval: Towards better llm-based evaluators through multi-agent debate
  • AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
  • Internet of agents: Weaving a web of heterogeneous agents for collaborative intelligence
  • Optima: Optimizing effectiveness and efficiency for llm-based multi-agent system
  • Goal Awareness for Conversational AI: Proactivity, Non-collaborativity, and Beyond
  • Enhancing Multi-Agent Consensus through Third-Party LLM Integration: Analyzing Uncertainty and Mitigating Hallucinations in Large Language Models
  • A survey of agent interoperability protocols: Model Context Protocol(MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP)
  • Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
  • Embodied llm agents learn to cooperate in organized teams
  • MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
  • CP-AgentNet: Autonomous and Explainable Communication Protocol Design Using Generative Agents
  • Camel: Communicative agents for" mind" exploration of large language model society
  • Theory of mind for multi-agent collaboration via large language models
  • Encouraging divergentthinking in large language models through multi-agent debate
  • A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
  • Merge, ensemble, and cooperate! a survey on collaborative strategies in the era of large language models

🌟Mitigating Hallucination

  • Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
  • Towards Mitigating LLM Hallucination via Self Reflection
  • Hallucination augmented contrastive learning for multimodal large language model
  • Tug-of-war between knowledge: Exploring and resolving knowledge conflicts in retrieval-augmented language models
  • Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge conflicts in language models
  • Simulation Agent: A Framework for Integrating Simulation and Large Language Models for Enhanced Decision-Making
  • Contrastive representation learning: A framework and review
  • Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision
  • Mitigating object hallucinations in large vision-language models through visual contrastive decoding
  • Banishing LLM hallucinations requires rethinking generalization
  • Treble counterfactual vlms: A causal approach to hallucination
  • When hindsight is not 20/20: Testing limits on reflective thinking in large language models
  • Contrastive modality-disentangled learning for multimodal recommendation
  • Generative Causality-driven Network for Graph Multi-task Learning
  • Advances and challenges in foundation agents: From brain-inspired intelligence to evolutionary, collaborative, and safe systems
  • Mitigating hallucination in large multi-modal models via robust instruction tuning
  • Reducing hallucinations in large vision-language models via latent space steering
  • Visual-rft: Visual reinforcement fine-tuning
  • Agentauditor: Human-level safety and security evaluation for llm agents

🌟Detecting Hallucination

  • Lettucedetect: A hallucination detection framework for rag applications
  • Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection
  • SEVADE: Self-Evolving Multi-Agent Analysis with Decoupled Evaluation for Hallucination-Resistant Irony Detection

About

a library of works related to Large Language Models (LLMs) based Agent Hallucination

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published