BIRCH is a dedicated benchmarking platform designed to address the challenges associated with evaluating the capabilities of foundation models (FMs) in multi-hunk code repair. It incorporates realistic multi-hunk bug instances from the Defects4J dataset and supports both open-source and proprietary LLMs. Additionally, Birch categorizes multi-hunk bugs based on complexity and provides a standardized interface for integrating and evaluating diverse repair techniques. This platform facilitates meaningful comparisons between LLMs and advances the state of research in multi-hunk program repair.
This repository contains three main experiments:
-
LLM-Only Experiment
All code for the LLM-only study can be found in the
birch_llm_promptingfolder. This folder is a symlink to the directory birch which contains the code. -
Prompt Augmentations
All code for the prompt-augmentation experiments can be found in the
birch_augmented_promptingfolder. This folder is a symlink to the directory redwood which contains the code. -
Hunk4J
Metadata, raw patch files, and code to extract metadata can be found in thebirch_augmented_promptingfolder.
.
βββ Hunk4J
β βββ README.md
β βββ code # Python scripts for JSON/metadata creation
β β βββ utils # Helper utilities
β βββ dataset # Multi-hunk metadata JSON files
β βββ javaparser # JavaParser-based AST context extractor
β β βββ method-line-extractor
β βββ patches # Raw `.patch` files for bugs
β
βββ birch
β βββ README.md # LLM-only repair workflow instructions
β βββ llm # LLM API wrappers and model definitions
β βββ prompt_configurations # Prompt templates (e.g., `prompts.toml`)
β βββ prompts # Python prompt generators
β βββ utils # Defects4J and LLM helper utilities
β βββ scripts β¦ # Scripts for checkout, repair, validation, and result summarization
β
βββ redwood
β βββ README.md # Augmented-technique workflow instructions
β βββ algorithms # Similar-example retrieval, AST/embedding algorithms, etc.
β βββ hunk4j_statistics # Scripts & CSVs for multi-hunk descriptive statistics
β βββ hunk_divergence # Divergence computation & analysis (Python, R, plots, CSVs)
β βββ proximity_class # Spatial-proximity classification tools & plots
β βββ prompt_configurations # TOML configs for feedback/retrieval prompts
β βββ prompts # Python modules for compiler-error & similar-result prompts
β βββ results # Results for all experiments with `passed_bugs.json` and summaries
β βββ solved_bugs_statistics # CSV reports of which bugs each LLM solved (per scope)
β βββ utils # Feedback-loop and general utilities
β
βββ birch-llm-prompting -> birch # symlink (LLM-only workflow)
βββ birch-augmented-prompting -> redwood # symlink (augmented workflow)
|
βββ images # Repository-wide image assets (e.g., birch-image.png)
βββ README.md # (this file)
