Python: Fix IndexError when reasoning models produce reasoning-only messages in Magentic-One workflow by moonbox3 · Pull Request #4413 · microsoft/agent-framework

moonbox3 · 2026-03-03T08:47:41Z

Motivation and Context

When reasoning models (e.g. gpt-5-mini) are used inside Magentic-One workflows, they can produce messages containing only text_reasoning content without any preceding text content. This caused an IndexError because the code assumed all_messages was non-empty when attaching reasoning_details.

Fixes #4384

Description

The root cause was in _prepare_message_for_openai in _chat_client.py: the text_reasoning case unconditionally accessed all_messages[-1] to attach reasoning details, but when reasoning content appeared before any text content, the list was empty. The fix initializes the message entry in all_messages if it hasn't been created yet before attaching reasoning details. Two regression tests are added covering reasoning-only messages and reasoning-before-text ordering.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

…t#4384) In _prepare_message_for_openai(), the text_reasoning case unconditionally accessed all_messages[-1] to attach reasoning_details. When a reasoning model (e.g. gpt-5-mini) returns reasoning_details without text content, all_messages is empty, causing an IndexError. Guard the access by initializing all_messages with the current args dict when it is empty, so reasoning_details can be safely attached. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 3 | Confidence: 64%

✓ Correctness

This fix addresses an IndexError when text_reasoning content appears before any other content (so all_messages is still empty). The fix appends args to all_messages when the list is empty, preventing the crash on all_messages[-1]. The tests adequately cover the regression cases. However, there is a potential correctness concern: if text_reasoning appears before text content, args is appended to all_messages by the new guard, and if later content-processing logic also appends args to all_messages (at end-of-loop or in another case branch), the same dict reference would appear twice in the output list. The test for the mixed case (test_prepare_message_with_text_reasoning_before_text) uses a very loose assertion (len >= 1) which could be masking this double-append issue.

✓ Security Reliability

This diff fixes an IndexError when all_messages is empty and a text_reasoning content is processed. The fix appends args to all_messages when the list is empty, preventing the crash. There are no security issues (no new injection surfaces, no secrets, no unsafe deserialization beyond pre-existing json.loads). The main reliability concern is that args could potentially be appended twice—once in the text_reasoning guard and once in a subsequent content handler—leading to a duplicate entry in the output. The test for mixed ordering uses a loose len(prepared) >= 1 assertion that could mask such duplication bugs.

✓ Test Coverage

The two new tests directly cover the regression fix (empty all_messages when text_reasoning is the first content item). The first test (test_prepare_message_with_only_text_reasoning_content) is well-structured with strong assertions. The second test (test_prepare_message_with_text_reasoning_before_text) uses weak assertions (len >= 1, any(...)) that would pass even if the text content were silently dropped, reducing its value as a regression guard.

Suggestions

In test_prepare_message_with_text_reasoning_before_text, the assertion assert len(prepared) >= 1 is unusually loose. Consider asserting the exact expected length (e.g., == 1 or == 2) to verify the fix doesn't produce duplicate messages when reasoning precedes text content. A loose assertion could mask a double-append bug where args is added to all_messages once by the new guard and again by the normal text-content path.
Verify that the rest of _prepare_message_for_openai does not also append args to all_messages after iterating content items (e.g., at the end of the content loop). If it does, the if not all_messages: all_messages.append(args) guard would cause args to appear twice in the result when reasoning content precedes text content.
Consider whether args can be appended to all_messages a second time by a subsequent content case handler (e.g., the default _ case for text content), which would produce a duplicate message entry. If so, a guard like if args not in all_messages or tracking an appended flag would be more robust.
The test test_prepare_message_with_text_reasoning_before_text uses assert len(prepared) >= 1, which is too loose to catch duplicate-entry bugs. Consider asserting the exact expected length and verifying each element's structure.
The second test (test_prepare_message_with_text_reasoning_before_text) should assert the exact expected length and verify the text content ('The answer is 42.') is present in the prepared output, rather than using loose checks like len(prepared) >= 1 and any(...).
Consider adding a test for multiple text_reasoning contents without any text content, to verify the fix handles repeated reasoning-only entries correctly.

Automated review by moonbox3's agents

python/packages/core/agent_framework/openai/_chat_client.py

python/packages/core/tests/openai/test_openai_chat_client.py

python/packages/core/agent_framework/openai/_chat_client.py

python/packages/core/tests/openai/test_openai_chat_client.py

markwallace-microsoft · 2026-03-03T08:50:22Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework/openai
_chat_client.py	292	26	91%	210, 240–241, 245, 363, 370, 446–453, 455–458, 468, 546, 548, 565, 603, 629, 645, 685
TOTAL	22264	2758	87%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
4728	247 💤	0 ❌	0 🔥	1m 21s ⏱️

Copilot

Pull request overview

Fixes an IndexError in the Python OpenAI chat client message preparation path when reasoning-capable models emit text_reasoning content without preceding text in Magentic-One workflows.

Changes:

Guard _prepare_message_for_openai against empty all_messages when attaching reasoning_details.
Add regression tests for reasoning-only messages and reasoning-before-text ordering.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
python/packages/core/agent_framework/openai/_chat_client.py	Prevents `IndexError` when `text_reasoning` appears before any message content.
python/packages/core/tests/openai/test_openai_chat_client.py	Adds regression coverage for the previously-crashing scenarios.

python/packages/core/tests/openai/test_openai_chat_client.py

python/packages/core/agent_framework/openai/_chat_client.py

…icrosoft#4384) - Buffer pending reasoning details and attach to the next message with content/tool_calls, avoiding standalone reasoning-only messages. - When reasoning is the only content, emit a message with empty content to satisfy Chat Completions schema requirements. - Strengthen test assertions to verify text+reasoning co-location and that all messages with reasoning_details also have content or tool_calls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 3 | Confidence: 89%

✓ Correctness

The change correctly fixes a bug where text_reasoning content appearing before any other content in a message would create an invalid Chat Completions payload (a message appended to all_messages without content or tool_calls). The new buffering approach via pending_reasoning properly defers the reasoning details until a real content or tool_calls message is ready, and the fallback for reasoning-only messages emits a valid payload with empty content. The updated tests are more precise and verify the structural correctness of the output. No correctness issues found.

✓ Security Reliability

The diff fixes a reliability bug where reasoning-only content could produce an invalid Chat Completions payload (missing the 'content' field). The changes buffer pending reasoning and attach it to the next substantive message, with a fallback that emits an empty-content message. No new security concerns: json.loads is called on internal framework-owned protected_data (not a user-controlled trust boundary), no secrets are introduced, and no resource leaks are present. The only minor reliability observation is that consecutive text_reasoning entries would silently overwrite the buffer, but this mirrors the prior behavior and is unlikely in practice.

✓ Test Coverage

The updated tests meaningfully strengthen assertions for the reasoning-before-text and reasoning-only scenarios, replacing weak 'any message has reasoning' checks with precise structural validation. However, the diff introduces a code path where pending_reasoning attaches to a message with tool_calls (not just content), and this path has no test coverage. Additionally, there is no test for the edge case where multiple consecutive text_reasoning contents appear (the second silently overwrites the first via pending_reasoning reassignment), nor for the author_name/role handling in the trailing fallback block.

Suggestions

If multiple text_reasoning contents appear before the first content/tool_calls item, pending_reasoning is silently overwritten, losing earlier reasoning blocks. Consider accumulating into a list if that scenario is possible.
If multiple consecutive text_reasoning contents are possible, consider accumulating pending_reasoning into a list rather than overwriting, to avoid silent data loss. The old code had the same limitation, so this is pre-existing.
Add a test for text_reasoning appearing before a tool-call content (e.g., FunctionCallContent). The production code attaches pending_reasoning when 'tool_calls' is in args, but no test exercises this branch.
Add a test for multiple consecutive text_reasoning contents before any text/tool content. Currently pending_reasoning is silently overwritten; a test would document whether this is intentional or a data-loss bug.
Add a test verifying the fallback message includes 'name' when message.author_name is set and role != 'tool', and omits it when role == 'tool'. The new fallback block at lines 597-600 has this logic but no dedicated coverage.

Automated review by moonbox3's agents

python/packages/core/tests/openai/test_openai_chat_client.py

…ft#4384) - Always buffer reasoning into pending_reasoning instead of conditionally attaching to the previous message via fragile all_messages emptiness check - Attach buffered reasoning to last message at end-of-loop when no subsequent content consumed it - Assert exact content values (content == '' not in ('', None)) - Assert exact list lengths (== 1 not >= 1) for stronger regression guards - Add test for reasoning before FunctionCallContent Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 3, 2026 08:47

markwallace-microsoft added the python label Mar 3, 2026

Copilot started reviewing on behalf of moonbox3 March 3, 2026 08:48 View session

moonbox3 commented Mar 3, 2026

View reviewed changes

Copilot AI reviewed Mar 3, 2026

View reviewed changes

python/packages/core/tests/openai/test_openai_chat_client.py Outdated Show resolved Hide resolved

python/packages/core/tests/openai/test_openai_chat_client.py Show resolved Hide resolved

python/packages/core/agent_framework/openai/_chat_client.py Outdated Show resolved Hide resolved

moonbox3 commented Mar 3, 2026

View reviewed changes

python/packages/core/tests/openai/test_openai_chat_client.py Outdated Show resolved Hide resolved

python/packages/core/tests/openai/test_openai_chat_client.py Show resolved Hide resolved

moonbox3 and others added 2 commits March 3, 2026 18:47

Apply pre-commit auto-fixes

7136c86

moonbox3 enabled auto-merge March 3, 2026 09:55

moonbox3 self-assigned this Mar 3, 2026

eavanvalkenburg approved these changes Mar 3, 2026

View reviewed changes

TaoChenOSU approved these changes Mar 3, 2026

View reviewed changes

moonbox3 added this pull request to the merge queue Mar 3, 2026

Merged via the queue into microsoft:main with commit dae3caa Mar 3, 2026
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix IndexError when reasoning models produce reasoning-only messages in Magentic-One workflow#4413

Python: Fix IndexError when reasoning models produce reasoning-only messages in Magentic-One workflow#4413
moonbox3 merged 4 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4384-1

moonbox3 commented Mar 3, 2026

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Mar 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

moonbox3 commented Mar 3, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

markwallace-microsoft commented Mar 3, 2026 •

edited

Loading