Python: Allow @tool functions to return rich content (images, audio)#4331
Python: Allow @tool functions to return rich content (images, audio)#4331giles17 wants to merge 13 commits intomicrosoft:mainfrom
@tool functions to return rich content (images, audio)#4331Conversation
…udio) Add support for tool functions to return Content objects that the model can perceive natively. Closes microsoft#4272 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…o giles/tool-rich-content-results
There was a problem hiding this comment.
Pull request overview
This PR enables @tool-decorated functions to return rich content (images, audio, files) that models can perceive natively, rather than having them serialized to JSON strings. This addresses issue #4272 by allowing vision-in-the-loop workflows where tools like capture_screenshot() or render_chart() can feed image content back into the model for analysis.
Changes:
- Core framework now preserves Content objects with rich media instead of JSON-serializing them
- Added
itemsfield to function_result Content to carry rich media alongside text results - Updated all 6 provider implementations to handle rich content (OpenAI Responses, OpenAI Chat, Anthropic support it natively; Bedrock, Ollama, Azure-AI log warnings)
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| python/packages/core/agent_framework/_types.py | Added items parameter to Content.init and from_function_result() to store rich media items; updated to_dict() to serialize items |
| python/packages/core/agent_framework/_tools.py | Updated parse_result() to return str or list[Content] instead of always serializing; added _build_function_result() helper to separate text and rich items; updated invoke() return type |
| python/packages/core/agent_framework/_mcp.py | Updated _parse_tool_result_from_mcp() to return list[Content] for results containing images/audio instead of JSON strings |
| python/packages/core/agent_framework/openai/_responses_client.py | Injects rich items as separate user message with input_image content after function_call_output |
| python/packages/core/agent_framework/openai/_chat_client.py | Formats tool message content as multi-part array with text and image_url/input_audio/file parts when items present |
| python/packages/anthropic/agent_framework_anthropic/_chat_client.py | Formats rich items as native image blocks in tool_result content array; handles both data and uri image types |
| python/packages/bedrock/agent_framework_bedrock/_chat_client.py | Logs warning when rich items present (Bedrock doesn't support them); omits items from tool result |
| python/packages/ollama/agent_framework_ollama/_chat_client.py | Logs warning when rich items present (Ollama doesn't support them); omits items from tool result |
| python/packages/azure-ai/agent_framework_azure_ai/_chat_client.py | Logs warning when rich items present (Azure AI Agents doesn't support them); omits items from tool output |
| python/packages/core/tests/core/test_types.py | Added 8 new tests for parse_result(), _build_function_result(), and Content.from_function_result() with items; updated 2 existing tests to expect list[Content] instead of JSON |
| python/packages/core/tests/core/test_mcp.py | Updated test_parse_tool_result_from_mcp to expect list[Content] for results with images; added test_parse_tool_result_from_mcp_audio_content |
eavanvalkenburg
left a comment
There was a problem hiding this comment.
So we recently made the switch to restrict return types, and one of the reasons was performance, the constant parsing of these results, both for otel and for the client is a bit wasteful. So could you have a look at whether a cache could be used in the parsing function in the different places? And we also need to do integration testing with this because openai chat shouldn't support this, so let's be sure, both with openai, azure openai, ollama and foundry local and maybe others that derive from openai chat
|
This is also #2513 |
…esult, fix Chat client - Preserve original content order in MCP tool results instead of text-first - Move _build_function_result logic into Content.from_function_result() - Chat Completions: inject user message for rich items (API only supports string tool content) - Update tests for ordering and new from_function_result behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tool functions to return rich content (images, audio)
- Responses client: put rich items directly in function_call_output's output field as list (native API support) instead of user message injection - Chat client: warn and omit rich items (API doesn't support multi-part tool results), matching Ollama/Bedrock pattern - Unify test image: use sample_image.jpg across all integration tests - Add Azure OpenAI Responses integration test - Assert model describes house image to verify perception Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 86%
✗ Correctness
This PR adds rich content (images, audio) support in tool results across multiple LLM provider clients. The implementation is well-structured with proper tests. The main correctness issue is a missing test asset file: the Anthropic integration test references a
sample_image.jpgin its owntests/assets/directory, but the diff only adds this file underpython/packages/core/tests/assets/. The Azure and OpenAI tests correctly useparent.parentto reach the core assets directory, but the Anthropic test usesparentwhich resolves to a non-existent path. The remaining changes are logically sound with appropriate fallback/warning behavior for providers that don't support rich tool results.
✓ Security Reliability
This PR adds rich content (images, audio) support to tool results across multiple LLM provider clients. The implementation is generally sound with appropriate fallback warnings for unsupported providers. There are no critical security issues, but there are a few reliability edge cases:
Content.from_function_resultlacks validation whenresultis a list, which can causeAttributeErroron non-Content items; the Anthropic client can send an emptycontentarray to the API if all rich items are unsupported; and the OpenAI Chat Completions client introduces acontinuethat may alter the original message-building control flow.
✗ Test Coverage
This diff adds rich content (images, audio) support in tool results across all providers. Core types and parse_result logic have solid unit tests (test_types.py), and MCP parsing is well-covered (test_mcp.py). However, the provider-specific formatting logic for rich content — the most complex new code — lacks unit tests entirely. The Anthropic client's new branching logic in _prepare_message_for_anthropic (data images, URI images, unsupported types) has zero unit tests. The OpenAI Responses client's new output_parts building in _prepare_content_for_openai also has no unit tests. The OpenAI Chat Completions client changed control flow (added continue statement) with no test verifying the warning/behavior with items. All three only have integration tests marked @pytest.mark.flaky, which won't catch regressions in normal CI runs.
Blocking Issues
- The Anthropic integration test will fail with FileNotFoundError:
Path(__file__).parent / "assets" / "sample_image.jpg"resolves topython/packages/anthropic/tests/assets/sample_image.jpg, but the image file is only added atpython/packages/core/tests/assets/sample_image.jpg. Either copy the asset to the Anthropic tests directory or fix the path. - No unit tests for Anthropic _prepare_message_for_anthropic rich content handling. The new branching logic (lines 716-753 of _chat_client.py) covers three distinct paths — data images, URI images, and unsupported types — none of which are tested. The existing test_prepare_message_for_anthropic_function_result only covers the plain-text fallback path.
- No unit tests for OpenAI Responses _prepare_content_for_openai rich content in function results. The new output_parts construction (lines 1214-1224 of _responses_client.py) recursively calls _prepare_content_for_openai for each item with no test coverage. Only a flaky integration test covers this path.
- The OpenAI Chat Completions client (lines 578-583 of openai/_chat_client.py) changed the control flow for ALL function_result messages by adding an explicit append+continue, and added a warning path for items. There is no unit test verifying that function results with items produce a warning and that the result is still correctly appended.
Suggestions
- In
_tools.pyparse_result, aContentwithtype="text"and empty/Nonetextwill fall through to JSON serialization via_make_dumpable. Consider returning""for this edge case. - In
_mcp.py, consider usingContent.from_data(with base64-decoded bytes) instead ofContent.from_uriwith a synthetic data: URI forImageContent/AudioContent. This avoids downstream consumers needing to parse the data: URI back out. - In
_types.pyfrom_function_result, theisinstance(result, list)branch assumes all items areContentobjects (accesses.type,.text). If the list contains non-Content items (e.g., strings), this will raiseAttributeError. Consider adding a guard likeall(isinstance(c, Content) for c in result)or handling non-Content items gracefully, consistent with howparse_resultdoes it. - In the Anthropic
_chat_client.py, ifcontent.itemsis truthy but all items have unsupported media types andcontent.resultis falsy,tool_contentwill be an empty list sent to the API. Consider falling back to the non-rich-content path or adding a text placeholder whentool_contentis empty. - In
_tools.pyparse_result, aContentobject withtype='text'and empty/Nonetextfalls through to generic JSON serialization via_make_dumpable, which may produce unexpected results. Consider returning''for that case. - Add a unit test for Content.from_function_result with a list containing only rich items (no text) to verify result is empty string and items are populated.
- Add unit tests for the warning log paths in Bedrock, Azure AI, and Ollama when content.items is non-empty, to ensure warnings are emitted and results are still correctly formatted.
- Consider adding a unit test for FunctionTool.parse_result with a list mixing Content and non-Content items to verify the Content.from_text(str(item)) fallback path.
- The integration test assertions like
assert 'house' in response.text.lower()are inherently fragile even with @pytest.mark.flaky. Consider asserting on structural properties (e.g., response contains text, tool was called) rather than model-generated content.
Automated review by moonbox3's agents
python/packages/anthropic/agent_framework_anthropic/_chat_client.py
Outdated
Show resolved
Hide resolved
- Add isinstance guard in from_function_result for non-Content lists - Fix Anthropic empty tool_content fallback to string result - Fix Content(type='text', text=None) edge case in parse_result - Rewrite MCP _parse_tool_result_from_mcp as single-pass (no index counters) - Add Anthropic unit tests: data image, uri image, unsupported media, all-unsupported - Add OpenAI Chat unit test: rich items warning and omission - Add OpenAI Responses unit tests: function_result with/without items - Add test_types tests: only-rich-items list, non-Content list fallback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Description
Closes #4272 and #2513
When a
@toolfunction returns aContentobject (e.g.Content.from_data(image_bytes, "image/png")), the framework now preserves it as rich content that the model can perceive natively, instead of serializing it to a JSON string.Problem
Previously,
FunctionTool.parse_result()serialized anyContentreturn to JSON text via_make_dumpable(). The model received a text blob, not the actual image. The same issue existed in MCP tool results whereImageContentwas JSON-serialized.Solution
Added an
itemsfield tofunction_resultContent that carries richContentobjects (images, audio, files) alongside the text result. Providers format these items using their existing multi-modal content handling.User API — no decorator changes needed:
Changes
Core framework:
_types.py: Addeditemsfield toContent. Updatedfrom_function_result()to acceptstr | list[Content]and split text from rich items internally._tools.py: Updatedparse_result()to preserveContentreturns instead of JSON-serializing. Updatedinvoke()return type._mcp.py: Updated_parse_tool_result_from_mcp()to returnlist[Content]for image/audio instead of JSON strings. Preserves original content ordering.All 6 providers updated:
input_imageafterfunction_call_outputtool_resultcontent arrayTests: 8 new tests + 2 updated existing tests, all passing.