Skip to content

LLM integration tests should be made more reliable #458

@jfeser

Description

@jfeser

For example, test_image_input passes locally but is failing on #424. As far as I can tell, this doesn't indicate a bug.

A few options:

  • We could run the tests repeatedly and pass if a sufficient fraction pass.
  • We could trace a live LLM when writing the test and replay the trace when running it.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions