Agentic code search #141

adityasoni9998 · 2025-12-08T18:41:08Z

Benchmarking code to evaluate open-source LLMs' ability to localize relevant source code files that require editing to fix a given GitHub issue in input

Note: DockerWorkspace somehow doesn't work for me on the latest benchmarks repo, but it did work for an older version. LocalWorkspace works fine though. I am using v1.4.1 of software agent sdk and ghcr.io/openhands/agent-server:latest-python in the docker image and there were probably some breaking changes in recent docker image?

neubig

@adityasoni9998 please fix git ci and re-request review

neubig · 2025-12-31T02:44:05Z

@OpenHands fix failing github CI

neubig · 2026-01-08T13:43:06Z

@OpenHands merge main and fix the failing CI. Note that you will need to push to a remote fork to do so.

openhands-ai · 2026-01-08T13:43:16Z

I'm on it! neubig can track my progress at all-hands.dev

- Merge main branch to get latest changes - Fix type errors in evaluate_instance and prepare_workspace method signatures - Update _capture_conversation_archive to accept both RemoteWorkspace and LocalWorkspace - Fix test_metrics.py to use LocalWorkspace mock for agentic_code_search benchmark - Add get_instruction patch for agentic_code_search in test_metrics.py Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-01-08T13:59:05Z

Summary

I have successfully merged main into the agentic_code_search branch and fixed the failing CI issues. The changes have been pushed to the remote fork at adityasoni9998/benchmarks.

Completed Tasks:

Merged main branch - Resolved merge conflicts in:
- benchmarks/utils/evaluation.py - Combined the new resource_factor and forward_env parameters with the LocalWorkspace support
- vendor/software-agent-sdk - Updated to the latest commit from main (73769d5e)
Fixed Type Errors (pyright):
- Updated prepare_workspace method signature in benchmarks/utils/evaluation.py to include resource_factor and forward_env parameters while keeping RemoteWorkspace | LocalWorkspace return type
- Updated _capture_conversation_archive to accept RemoteWorkspace | LocalWorkspace and added early return for non-RemoteWorkspace
- Fixed evaluate_instance method signature in benchmarks/swebenchmultimodal/run_infer.py to use RemoteWorkspace | LocalWorkspace
- Fixed evaluate_instance and prepare_workspace method signatures in tests/test_iterative_resume.py
- Fixed prepare_workspace and evaluate_instance method signatures in benchmarks/agentic_code_search/run_infer.py
Fixed Test Failures:
- Updated tests/test_metrics.py to:
  - Add get_instruction patch for agentic_code_search benchmark (similar to swebench)
  - Use LocalWorkspace mock instead of RemoteWorkspace mock for agentic_code_search benchmark

Verification:

All 20 tests pass
All pre-commit checks pass (Ruff format, Ruff lint, pycodestyle, pyright)

PR Link:

#141

The CI should now pass with these fixes.

View full conversation

openhands-agent and others added 16 commits November 6, 2025 21:04

Update software-agent-sdk submodule to main

055e4e6

initial commit, eval for code search

a6ec978

Num runs should be managed by the user externally

36fa267

Update software-agent-sdk submodule to main

7d3d360

docker works

5bf46dd

example config for qwen3

1fc3cac

local runtime works

5f74f63

use host network in agent sdk

5e2820d

add eval

bfe182a

add eval

72ef6ff

add analysis code

b891149

module-level rewards

479c081

fine-grained rewards eval

86957d8

fine-grained rewards

fe75fb2

docker doesn't work but local does

64bb3ee

update README

db8e7bb

This was referenced Dec 8, 2025

Commit the file module in class level rewards. OpenHands/agentic-code-search-oss#19

Closed

Add efficiency metrics to the benchmarking code OpenHands/agentic-code-search-oss#26

Closed

neubig reviewed Dec 8, 2025

View reviewed changes

adityasoni9998 added 7 commits December 22, 2025 12:16

Merge branch 'main' into agentic_code_search

6b92366

revert to only allow local workspace in agentic code search

6d52715

minor code bug fix

76b4a01

Merge branch 'main' into agentic_code_search

dea232c

Update software-agent-sdk submodule to match trainer

a417dc6

update parser config

11ea94e

add dataset

7730bac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agentic code search #141

Agentic code search #141

Uh oh!

adityasoni9998 commented Dec 8, 2025 •

edited

Loading

Uh oh!

neubig left a comment

Uh oh!

neubig commented Dec 31, 2025

Uh oh!

neubig commented Jan 8, 2026

Uh oh!

openhands-ai bot commented Jan 8, 2026

Uh oh!

openhands-ai bot commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Agentic code search #141

Are you sure you want to change the base?

Agentic code search #141

Uh oh!

Conversation

adityasoni9998 commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neubig left a comment

Choose a reason for hiding this comment

Uh oh!

neubig commented Dec 31, 2025

Uh oh!

neubig commented Jan 8, 2026

Uh oh!

openhands-ai bot commented Jan 8, 2026

Uh oh!

openhands-ai bot commented Jan 8, 2026

Summary

Completed Tasks:

Verification:

PR Link:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adityasoni9998 commented Dec 8, 2025 •

edited

Loading