[Research] Benchmark different retrieval, search reasoning strategies

We found a large boost from our Qwen3-8b-embedding based, top-5 chunk retrieval.

Getting better context is probably the highest impact lever for forecasting. A good followup here would be playing around with different retrieval / search strategies. Forecasting can be a great benchmark for [reasoning-intensive retrieval](https://arxiv.org/abs/2504.20595).

In the limit, you could train a [search-agent](https://github.com/PeterGriffinJin/Search-R1), that makes its own calls to the retrieval/search tools during reasoning. This can be important as we should retrieve what the model is uncertain about, this changes as new information is retrieved, and the model probably knows best :)  

Let us know if you take this up and run into any issues!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Research] Benchmark different retrieval, search reasoning strategies #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Research] Benchmark different retrieval, search reasoning strategies #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions