-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi, I noticed some inconsistencies in the ground truth labels and descriptions for agent_multi_step samples.
1. Logical Inconsistency in Message Deletion In these samples, the user explicitly instructs: "If the message capacity is full, delete the most recent message."
However, the current ground truth performs the following:
- Calls
get_earliest_message_id()instead ofget_latest_message_id(). - Deletes
message_id=3instead ofmessage_id=4(assuming 4 is the latest).
💡Suggested Fix: The ground truth should be updated to call the function that retrieves the latest message ID to align with the user's prompt.
2. Ambiguity in the todo-list Description The current description states:
"First, help Frank place a food delivery order at 'Hema Fresh,' ordering two 'Fresh Gift Packs.'"
The phrase "help Frank" is ambiguous. It is more likely to be interpreted as requiring the agent to log in to Frank's account, whereas the context implies the current user (Eve) is placing the order.
💡Suggested Fix: I suggest removing "help Frank" to avoid confusion regarding account switching. The sentence would be clearer as:
"First, place a food delivery order at 'Hema Fresh,' ordering two 'Fresh Gift Packs.'"
3. Unnatural Dependency leading to "Goal Dropping"
The current environment enforces a constraint where send_message() fails because the Inbox is full. This is counter-intuitive and contradicts the standard logic of email systems (where sending is independent of inbox storage).
The Core Problem: Because this dependency is unnatural, the Agent lacks the prior knowledge to infer that "Cleaning the Inbox" is a prerequisite for "Sending". When the Agent receives the error, it correctly switches focus to the sub-goal (deleting a message). However, once the deletion is successful, the Agent fails to link this back to the original goal (sending). It assumes the task is "Handling the storage issue" and terminates the conversation, resulting in failures.
💡 Suggested Fix: Since the constraint is non-standard, the environment must explicitly bridge this logical gap. The error message should serve as a chain-of-thought prompt to remind the Agent of the suspended original goal.
Change Error Message From:
...You need to ask the user which message to delete.
to
...You need to ask the user which message to delete and resend the message.
Thanks!