feat: upgraded InspectAI, OpenAI and MCP versions to support Gemini 3 Pro Preview #333

geelen · 2025-12-02T05:50:53Z

Summary

I was getting the following error running evals against Gemini 3 Pro:

Task failed due to runtime error: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Function call is missing a thought_signature in functionCall parts. This is required for tools to work correctly, and missing thought_signature may lead to degraded model performance. Additional data, function call `default_api:meta__route` , position 2. Please refer to https://ai.google.dev/gemini-api/docs/thought-signatures for more details.', 'status': 'INVALID_ARGUMENT'}}

Turns out that was fixed last week: https://github.com/UKGovernmentBEIS/inspect_ai/pull/2819/files

What are you adding?

Changes Made

Just pyproject.toml and uv.lock

Testing

I have run the existing test suite (pytest)
I have added tests for my changes
I have tested with multiple model providers (if applicable)
I have run pre-commit hooks (pre-commit run --all-files)

## mmlu (14,042 samples): groq/openai/gpt-oss-120b
accuracy                  0.876
stderr                    0.003

## gpqa_diamond (198 x 10 samples): groq/openai/gpt-oss-20b

MAIN
accuracy    0.471
stderr      0.031
std         0.435

THIS PR
accuracy    0.472
stderr      0.031
std         0.431

## humaneval (164 x 5 samples): groq/openai/gpt-oss-20b
MAIN
verify           verify           verify           verify
accuracy  0.944  accuracy  0.944  accuracy  0.974  accuracy  0.988
stderr    0.013  stderr    0.013  stderr    0.010  stderr    0.009

THIS PR
verify           verify           verify           verify
accuracy  0.944  accuracy  0.944  accuracy  0.970  accuracy  0.988
stderr    0.014  stderr    0.014  stderr    0.011  stderr    0.009

Checklist

My code follows the project's style guidelines
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (if applicable)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Note

Upgrade core deps (inspect-ai 0.3.151, openai 2.8.0, mcp 1.22.0) and bump package version to 0.5.3 with refreshed lockfile and transitive updates.

Dependencies:
- Update inspect-ai to 0.3.151 (adds frozendict, switches to nest-asyncio2).
- Update openai to >=2.8.0.
- Update mcp to >=1.22.0 (adds pyjwt[crypto], typing-extensions, typing-inspection; pulls in cryptography, cffi, pycparser).
- Lockfile refresh updates transitive packages (e.g., inspect_swe to 0.2.27).
Project:
- Bump package versions to 0.5.3 in pyproject.toml and packages/openbench-core/pyproject.toml.

^{Written by Cursor Bugbot for commit 2cd4e13. This will update automatically on new commits. Configure here.}

socket-security · 2025-12-02T05:51:49Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	inspect-ai@0.3.141 ⏵ 0.3.151	^-26
	mcp@1.13.1 ⏵ 1.22.0
	openai@2.8.1 ⏵ 2.8.0
	pycparser@2.23
	anthropic@0.74.1 ⏵ 0.73.0
	frozendict@2.4.7
	nest-asyncio2@1.7.1
	inspect-swe@0.2.26 ⏵ 0.2.27	⁺¹
	pyjwt@2.10.1

View full report

… Pro Preview

nmayorga7 · 2025-12-16T22:55:42Z

can we test exercism and mathvista as well please?

geelen requested review from AarushSah and nmayorga7 as code owners December 2, 2025 05:50

geelen enabled auto-merge (squash) December 9, 2025 00:38

feat: upgraded InspectAI, OpenAI and MCP versions to support Gemini 3…

2cd4e13

… Pro Preview

geelen force-pushed the gemini-3-inspect-ai-upgrade branch from 970ddbe to 2cd4e13 Compare December 16, 2025 00:25

geelen mentioned this pull request Dec 17, 2025

[DRAFT] ProgressiveMCPBench #340

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: upgraded InspectAI, OpenAI and MCP versions to support Gemini 3 Pro Preview #333

feat: upgraded InspectAI, OpenAI and MCP versions to support Gemini 3 Pro Preview #333

Uh oh!

geelen commented Dec 2, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

nmayorga7 commented Dec 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

feat: upgraded InspectAI, OpenAI and MCP versions to support Gemini 3 Pro Preview #333

Are you sure you want to change the base?

feat: upgraded InspectAI, OpenAI and MCP versions to support Gemini 3 Pro Preview #333

Uh oh!

Conversation

geelen commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What are you adding?

Changes Made

Testing

Checklist

Uh oh!

socket-security bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmayorga7 commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

geelen commented Dec 2, 2025 •

edited

Loading

socket-security bot commented Dec 2, 2025 •

edited

Loading

nmayorga7 commented Dec 16, 2025 •

edited

Loading