Python: Fix IndexError when reasoning models produce reasoning-only messages in Magentic-One workflow#4413
Conversation
…t#4384) In _prepare_message_for_openai(), the text_reasoning case unconditionally accessed all_messages[-1] to attach reasoning_details. When a reasoning model (e.g. gpt-5-mini) returns reasoning_details without text content, all_messages is empty, causing an IndexError. Guard the access by initializing all_messages with the current args dict when it is empty, so reasoning_details can be safely attached. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 64%
✓ Correctness
This fix addresses an IndexError when text_reasoning content appears before any other content (so all_messages is still empty). The fix appends args to all_messages when the list is empty, preventing the crash on all_messages[-1]. The tests adequately cover the regression cases. However, there is a potential correctness concern: if text_reasoning appears before text content, args is appended to all_messages by the new guard, and if later content-processing logic also appends args to all_messages (at end-of-loop or in another case branch), the same dict reference would appear twice in the output list. The test for the mixed case (test_prepare_message_with_text_reasoning_before_text) uses a very loose assertion (len >= 1) which could be masking this double-append issue.
✓ Security Reliability
This diff fixes an IndexError when
all_messagesis empty and atext_reasoningcontent is processed. The fix appendsargstoall_messageswhen the list is empty, preventing the crash. There are no security issues (no new injection surfaces, no secrets, no unsafe deserialization beyond pre-existingjson.loads). The main reliability concern is thatargscould potentially be appended twice—once in thetext_reasoningguard and once in a subsequent content handler—leading to a duplicate entry in the output. The test for mixed ordering uses a looselen(prepared) >= 1assertion that could mask such duplication bugs.
✓ Test Coverage
The two new tests directly cover the regression fix (empty
all_messageswhentext_reasoningis the first content item). The first test (test_prepare_message_with_only_text_reasoning_content) is well-structured with strong assertions. The second test (test_prepare_message_with_text_reasoning_before_text) uses weak assertions (len >= 1,any(...)) that would pass even if the text content were silently dropped, reducing its value as a regression guard.
Suggestions
- In test_prepare_message_with_text_reasoning_before_text, the assertion
assert len(prepared) >= 1is unusually loose. Consider asserting the exact expected length (e.g.,== 1or== 2) to verify the fix doesn't produce duplicate messages when reasoning precedes text content. A loose assertion could mask a double-append bug whereargsis added toall_messagesonce by the new guard and again by the normal text-content path. - Verify that the rest of
_prepare_message_for_openaidoes not also appendargstoall_messagesafter iterating content items (e.g., at the end of the content loop). If it does, theif not all_messages: all_messages.append(args)guard would causeargsto appear twice in the result when reasoning content precedes text content. - Consider whether
argscan be appended toall_messagesa second time by a subsequent content case handler (e.g., the default_case for text content), which would produce a duplicate message entry. If so, a guard likeif args not in all_messagesor tracking anappendedflag would be more robust. - The test
test_prepare_message_with_text_reasoning_before_textusesassert len(prepared) >= 1, which is too loose to catch duplicate-entry bugs. Consider asserting the exact expected length and verifying each element's structure. - The second test (
test_prepare_message_with_text_reasoning_before_text) should assert the exact expected length and verify the text content ('The answer is 42.') is present in the prepared output, rather than using loose checks likelen(prepared) >= 1andany(...). - Consider adding a test for multiple
text_reasoningcontents without any text content, to verify the fix handles repeated reasoning-only entries correctly.
Automated review by moonbox3's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
Fixes an IndexError in the Python OpenAI chat client message preparation path when reasoning-capable models emit text_reasoning content without preceding text in Magentic-One workflows.
Changes:
- Guard
_prepare_message_for_openaiagainst emptyall_messageswhen attachingreasoning_details. - Add regression tests for reasoning-only messages and reasoning-before-text ordering.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| python/packages/core/agent_framework/openai/_chat_client.py | Prevents IndexError when text_reasoning appears before any message content. |
| python/packages/core/tests/openai/test_openai_chat_client.py | Adds regression coverage for the previously-crashing scenarios. |
…icrosoft#4384) - Buffer pending reasoning details and attach to the next message with content/tool_calls, avoiding standalone reasoning-only messages. - When reasoning is the only content, emit a message with empty content to satisfy Chat Completions schema requirements. - Strengthen test assertions to verify text+reasoning co-location and that all messages with reasoning_details also have content or tool_calls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 89%
✓ Correctness
The change correctly fixes a bug where
text_reasoningcontent appearing before any other content in a message would create an invalid Chat Completions payload (a message appended toall_messageswithoutcontentortool_calls). The new buffering approach viapending_reasoningproperly defers the reasoning details until a real content or tool_calls message is ready, and the fallback for reasoning-only messages emits a valid payload with empty content. The updated tests are more precise and verify the structural correctness of the output. No correctness issues found.
✓ Security Reliability
The diff fixes a reliability bug where reasoning-only content could produce an invalid Chat Completions payload (missing the 'content' field). The changes buffer pending reasoning and attach it to the next substantive message, with a fallback that emits an empty-content message. No new security concerns: json.loads is called on internal framework-owned protected_data (not a user-controlled trust boundary), no secrets are introduced, and no resource leaks are present. The only minor reliability observation is that consecutive text_reasoning entries would silently overwrite the buffer, but this mirrors the prior behavior and is unlikely in practice.
✓ Test Coverage
The updated tests meaningfully strengthen assertions for the reasoning-before-text and reasoning-only scenarios, replacing weak 'any message has reasoning' checks with precise structural validation. However, the diff introduces a code path where pending_reasoning attaches to a message with tool_calls (not just content), and this path has no test coverage. Additionally, there is no test for the edge case where multiple consecutive text_reasoning contents appear (the second silently overwrites the first via pending_reasoning reassignment), nor for the author_name/role handling in the trailing fallback block.
Suggestions
- If multiple
text_reasoningcontents appear before the first content/tool_calls item,pending_reasoningis silently overwritten, losing earlier reasoning blocks. Consider accumulating into a list if that scenario is possible. - If multiple consecutive text_reasoning contents are possible, consider accumulating pending_reasoning into a list rather than overwriting, to avoid silent data loss. The old code had the same limitation, so this is pre-existing.
- Add a test for text_reasoning appearing before a tool-call content (e.g., FunctionCallContent). The production code attaches pending_reasoning when 'tool_calls' is in args, but no test exercises this branch.
- Add a test for multiple consecutive text_reasoning contents before any text/tool content. Currently pending_reasoning is silently overwritten; a test would document whether this is intentional or a data-loss bug.
- Add a test verifying the fallback message includes 'name' when message.author_name is set and role != 'tool', and omits it when role == 'tool'. The new fallback block at lines 597-600 has this logic but no dedicated coverage.
Automated review by moonbox3's agents
…ft#4384) - Always buffer reasoning into pending_reasoning instead of conditionally attaching to the previous message via fragile all_messages emptiness check - Attach buffered reasoning to last message at end-of-loop when no subsequent content consumed it - Assert exact content values (content == '' not in ('', None)) - Assert exact list lengths (== 1 not >= 1) for stronger regression guards - Add test for reasoning before FunctionCallContent Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
When reasoning models (e.g. gpt-5-mini) are used inside Magentic-One workflows, they can produce messages containing only
text_reasoningcontent without any preceding text content. This caused anIndexErrorbecause the code assumedall_messageswas non-empty when attachingreasoning_details.Fixes #4384
Description
The root cause was in
_prepare_message_for_openaiin_chat_client.py: thetext_reasoningcase unconditionally accessedall_messages[-1]to attach reasoning details, but when reasoning content appeared before any text content, the list was empty. The fix initializes the message entry inall_messagesif it hasn't been created yet before attaching reasoning details. Two regression tests are added covering reasoning-only messages and reasoning-before-text ordering.Contribution Checklist