Skip to content

[Research] Benchmark different retrieval, search reasoning strategies #2

@shash42

Description

@shash42

We found a large boost from our Qwen3-8b-embedding based, top-5 chunk retrieval.

Getting better context is probably the highest impact lever for forecasting. A good followup here would be playing around with different retrieval / search strategies. Forecasting can be a great benchmark for reasoning-intensive retrieval.

In the limit, you could train a search-agent, that makes its own calls to the retrieval/search tools during reasoning. This can be important as we should retrieve what the model is uncertain about, this changes as new information is retrieved, and the model probably knows best :)

Let us know if you take this up and run into any issues!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions