From 09456636dd08ce82701a1798c2867ccc1b23f5b1 Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:10:58 -0400 Subject: [PATCH 1/9] docs(oss): add suggested models to Deep Agents models page --- src/oss/deepagents/models.mdx | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index 8dfcd44dfb..4e307d7b90 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -93,6 +93,16 @@ To configure model-specific parameters, use @[init_chat_model] or instantiate a Deep agents work with any chat model that supports [tool calling](/oss/langchain/models#tool-calling). See [chat model integrations](/oss/integrations/chat) for the full list of supported providers. +### Suggested models + +These models perform well on the Deep Agents eval suite, which tests basic agent operations. Passing these evals is necessary but not sufficient for strong performance on longer, more complex tasks. + +- **Anthropic**: claude-opus-4-6, claude-opus-4-5, claude-sonnet-4-6, claude-sonnet-4, claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1 +- **OpenAI**: gpt-5.4, gpt-4o, gpt-4.1, o4-mini, gpt-5.2-codex, gpt-4o-mini, o3 +- **Google**: gemini-3-flash-preview, gemini-3.1-pro-preview +- **Open-weight models** (available via Baseten and Fireworks): GLM-5, Kimi-K2.5, MiniMax-M2.5 +- **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B + ## Learn more - [Models in LangChain](/oss/langchain/models): chat model features including tool calling, structured output, and multimodality From f1d70e24ba14b5d7943a4b059c1117d8b8039a7b Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:12:24 -0400 Subject: [PATCH 2/9] docs(oss): add provider cross-links to suggested models --- src/oss/deepagents/models.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index 4e307d7b90..c59901857b 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -97,10 +97,10 @@ Deep agents work with any chat model that supports [tool calling](/oss/langchain These models perform well on the Deep Agents eval suite, which tests basic agent operations. Passing these evals is necessary but not sufficient for strong performance on longer, more complex tasks. -- **Anthropic**: claude-opus-4-6, claude-opus-4-5, claude-sonnet-4-6, claude-sonnet-4, claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1 -- **OpenAI**: gpt-5.4, gpt-4o, gpt-4.1, o4-mini, gpt-5.2-codex, gpt-4o-mini, o3 -- **Google**: gemini-3-flash-preview, gemini-3.1-pro-preview -- **Open-weight models** (available via Baseten and Fireworks): GLM-5, Kimi-K2.5, MiniMax-M2.5 +- **[Anthropic](/oss/integrations/providers/anthropic)**: claude-opus-4-6, claude-opus-4-5, claude-sonnet-4-6, claude-sonnet-4, claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1 +- **[OpenAI](/oss/integrations/providers/openai)**: gpt-5.4, gpt-4o, gpt-4.1, o4-mini, gpt-5.2-codex, gpt-4o-mini, o3 +- **[Google](/oss/integrations/providers/google)**: gemini-3-flash-preview, gemini-3.1-pro-preview +- **Open-weight models** (available via [Baseten](/oss/integrations/providers/baseten) and [Fireworks](/oss/integrations/providers/fireworks)): GLM-5, Kimi-K2.5, MiniMax-M2.5 - **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B ## Learn more From 612e25c2252eb3ad87146753ff47ff90e7f3ac55 Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:14:32 -0400 Subject: [PATCH 3/9] docs(oss): add note about older models --- src/oss/deepagents/models.mdx | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index c59901857b..2cc9961a9c 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -103,6 +103,10 @@ These models perform well on the Deep Agents eval suite, which tests basic agent - **Open-weight models** (available via [Baseten](/oss/integrations/providers/baseten) and [Fireworks](/oss/integrations/providers/fireworks)): GLM-5, Kimi-K2.5, MiniMax-M2.5 - **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B + + Older models from these providers (e.g., GPT-4, Claude 3, Gemini 1.5) tend to perform worse on agentic tasks and are not recommended unless you've benchmarked them for your specific use case. + + ## Learn more - [Models in LangChain](/oss/langchain/models): chat model features including tool calling, structured output, and multimodality From ef3bb702656cfeec4d0862aa73c38c261dfcf0cf Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:15:00 -0400 Subject: [PATCH 4/9] docs(oss): tweak older models note wording --- src/oss/deepagents/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index 2cc9961a9c..98b88c0a38 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -104,7 +104,7 @@ These models perform well on the Deep Agents eval suite, which tests basic agent - **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B - Older models from these providers (e.g., GPT-4, Claude 3, Gemini 1.5) tend to perform worse on agentic tasks and are not recommended unless you've benchmarked them for your specific use case. + Older OpenAI, Anthropic, and Google models not listed above tend to perform worse on agentic tasks and are not recommended unless you've benchmarked them for your specific use case. ## Learn more From ff54fbec567a052870abd00ea7de27752ba5a3d2 Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:15:41 -0400 Subject: [PATCH 5/9] docs(oss): refine older models note wording --- src/oss/deepagents/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index 98b88c0a38..778c64d379 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -104,7 +104,7 @@ These models perform well on the Deep Agents eval suite, which tests basic agent - **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B - Older OpenAI, Anthropic, and Google models not listed above tend to perform worse on agentic tasks and are not recommended unless you've benchmarked them for your specific use case. + Older OpenAI, Anthropic, and Google models not listed above tend to perform worse in our eval suite. If you want to use them, we recommend that you benchmark them on your particular use case. ## Learn more From 8f8938f9541d0c4b32aa5133dba7d8e729018162 Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:16:05 -0400 Subject: [PATCH 6/9] docs(oss): remove older models note --- src/oss/deepagents/models.mdx | 4 ---- 1 file changed, 4 deletions(-) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index 778c64d379..c59901857b 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -103,10 +103,6 @@ These models perform well on the Deep Agents eval suite, which tests basic agent - **Open-weight models** (available via [Baseten](/oss/integrations/providers/baseten) and [Fireworks](/oss/integrations/providers/fireworks)): GLM-5, Kimi-K2.5, MiniMax-M2.5 - **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B - - Older OpenAI, Anthropic, and Google models not listed above tend to perform worse in our eval suite. If you want to use them, we recommend that you benchmark them on your particular use case. - - ## Learn more - [Models in LangChain](/oss/langchain/models): chat model features including tool calling, structured output, and multimodality From fe24037f0fb0b928320c3c3499d1eff995607f95 Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:17:51 -0400 Subject: [PATCH 7/9] docs(oss): recommend baseten/fireworks for hardware performance --- src/oss/deepagents/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index c59901857b..d02c8e3ea1 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -100,7 +100,7 @@ These models perform well on the Deep Agents eval suite, which tests basic agent - **[Anthropic](/oss/integrations/providers/anthropic)**: claude-opus-4-6, claude-opus-4-5, claude-sonnet-4-6, claude-sonnet-4, claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1 - **[OpenAI](/oss/integrations/providers/openai)**: gpt-5.4, gpt-4o, gpt-4.1, o4-mini, gpt-5.2-codex, gpt-4o-mini, o3 - **[Google](/oss/integrations/providers/google)**: gemini-3-flash-preview, gemini-3.1-pro-preview -- **Open-weight models** (available via [Baseten](/oss/integrations/providers/baseten) and [Fireworks](/oss/integrations/providers/fireworks)): GLM-5, Kimi-K2.5, MiniMax-M2.5 +- **Open-weight models** (we recommend [Baseten](/oss/integrations/providers/baseten) and [Fireworks](/oss/integrations/providers/fireworks) for hardware performance): GLM-5, Kimi-K2.5, MiniMax-M2.5 - **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B ## Learn more From 9c5f0aa4be50a148fe55ffbc2b8f39f68bbc2463 Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Sun, 8 Mar 2026 22:18:24 -0400 Subject: [PATCH 8/9] docs(oss): update baseten/fireworks wording to 'fast inference' --- src/oss/deepagents/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index d02c8e3ea1..30a02a2032 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -100,7 +100,7 @@ These models perform well on the Deep Agents eval suite, which tests basic agent - **[Anthropic](/oss/integrations/providers/anthropic)**: claude-opus-4-6, claude-opus-4-5, claude-sonnet-4-6, claude-sonnet-4, claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1 - **[OpenAI](/oss/integrations/providers/openai)**: gpt-5.4, gpt-4o, gpt-4.1, o4-mini, gpt-5.2-codex, gpt-4o-mini, o3 - **[Google](/oss/integrations/providers/google)**: gemini-3-flash-preview, gemini-3.1-pro-preview -- **Open-weight models** (we recommend [Baseten](/oss/integrations/providers/baseten) and [Fireworks](/oss/integrations/providers/fireworks) for hardware performance): GLM-5, Kimi-K2.5, MiniMax-M2.5 +- **Open-weight models** (we recommend [Baseten](/oss/integrations/providers/baseten) and [Fireworks](/oss/integrations/providers/fireworks) for fast inference): GLM-5, Kimi-K2.5, MiniMax-M2.5 - **Other open-weight models**: Qwen3.5-397B-A17B, Devstral-2-123B ## Learn more From d8f6a033210941541e4b6ac062ff6ebaf6132dc8 Mon Sep 17 00:00:00 2001 From: Eugene Yurtsev Date: Mon, 9 Mar 2026 18:23:38 -0400 Subject: [PATCH 9/9] docs: link deepagents eval suite --- src/oss/deepagents/models.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/oss/deepagents/models.mdx b/src/oss/deepagents/models.mdx index 30a02a2032..fe6db262ee 100644 --- a/src/oss/deepagents/models.mdx +++ b/src/oss/deepagents/models.mdx @@ -95,7 +95,7 @@ Deep agents work with any chat model that supports [tool calling](/oss/langchain ### Suggested models -These models perform well on the Deep Agents eval suite, which tests basic agent operations. Passing these evals is necessary but not sufficient for strong performance on longer, more complex tasks. +These models perform well on the [Deep Agents eval suite](https://github.com/langchain-ai/deepagents/tree/main/libs/deepagents/tests/evals), which tests basic agent operations. Passing these evals is necessary but not sufficient for strong performance on longer, more complex tasks. - **[Anthropic](/oss/integrations/providers/anthropic)**: claude-opus-4-6, claude-opus-4-5, claude-sonnet-4-6, claude-sonnet-4, claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1 - **[OpenAI](/oss/integrations/providers/openai)**: gpt-5.4, gpt-4o, gpt-4.1, o4-mini, gpt-5.2-codex, gpt-4o-mini, o3