Skip to content

Bump to gpt5 models#169

Open
Qard wants to merge 3 commits intomainfrom
gpt5
Open

Bump to gpt5 models#169
Qard wants to merge 3 commits intomainfrom
gpt5

Conversation

@Qard
Copy link
Contributor

@Qard Qard commented Jan 30, 2026

No description provided.

@Qard Qard requested review from ankrgyl and ibolmo January 30, 2026 17:06
@Qard Qard self-assigned this Jan 30, 2026
@Qard Qard added the enhancement New feature or request label Jan 30, 2026
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

Braintrust eval report

Autoevals (gpt5-1770325173)

Score Average Improvements Regressions
NumericDiff 79% (+1pp) 10 🟢 6 🔴
Time_to_first_token 9.49tok (+8.16tok) - 119 🔴
Llm_calls 1.55 (+0) - -
Tool_calls 0 (+0) - -
Errors 0 (+0) - -
Llm_errors 0 (+0) - -
Tool_errors 0 (+0) - -
Prompt_tokens 317.7tok (+0tok) - -
Prompt_cached_tokens 0tok (+0tok) - -
Prompt_cache_creation_tokens 0tok (+0tok) - -
Completion_tokens 246.27tok (-2.7tok) 47 🟢 51 🔴
Completion_reasoning_tokens 0tok (+0tok) - -
Total_tokens 563.97tok (-2.7tok) 47 🟢 51 🔴
Estimated_cost 0$ (+0$) - 119 🔴
Duration 10.01s (+4.61s) 49 🟢 170 🔴
Llm_duration 10.84s (+8.11s) - 119 🔴

Qard and others added 2 commits February 6, 2026 04:15
- Remove temperature=0 from ragas tests (gpt-5 models don't support custom temperature)
- Add division by zero guard in ContextRecall for both JS and Python
- Mark ContextEntityRecall test as can_fail due to LLM non-determinism

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
GPT-5 models don't support custom temperature values. Removed the
default temperature=0 from parseArgs in ragas.ts and marked
ContextRecall test as can_fail due to LLM non-determinism.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant