We found a large boost from our Qwen3-8b-embedding based, top-5 chunk retrieval.
Getting better context is probably the highest impact lever for forecasting. A good followup here would be playing around with different retrieval / search strategies. Forecasting can be a great benchmark for reasoning-intensive retrieval.
In the limit, you could train a search-agent, that makes its own calls to the retrieval/search tools during reasoning. This can be important as we should retrieve what the model is uncertain about, this changes as new information is retrieved, and the model probably knows best :)
Let us know if you take this up and run into any issues!