Skip to content

nashid/birch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BIRCH: Benchmarking Infrastructure for Repairing Code Hunks πŸš€

BIRCH is a dedicated benchmarking platform designed to address the challenges associated with evaluating the capabilities of foundation models (FMs) in multi-hunk code repair. It incorporates realistic multi-hunk bug instances from the Defects4J dataset and supports both open-source and proprietary LLMs. Additionally, Birch categorizes multi-hunk bugs based on complexity and provides a standardized interface for integrating and evaluating diverse repair techniques. This platform facilitates meaningful comparisons between LLMs and advances the state of research in multi-hunk program repair.

BIRCH: Benchmarking Infrastructure for Repairing Code Hunks

This repository contains three main experiments:

  1. LLM-Only Experiment
    All code for the LLM-only study can be found in the
    birch_llm_prompting folder. This folder is a symlink to the directory birch which contains the code.

  2. Prompt Augmentations
    All code for the prompt-augmentation experiments can be found in the
    birch_augmented_prompting folder. This folder is a symlink to the directory redwood which contains the code.

  3. Hunk4J
    Metadata, raw patch files, and code to extract metadata can be found in the birch_augmented_prompting folder.

Project Structure

.
β”œβ”€β”€ Hunk4J
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ code                          # Python scripts for JSON/metadata creation
β”‚   β”‚   └── utils                     # Helper utilities
β”‚   β”œβ”€β”€ dataset                       # Multi-hunk metadata JSON files
β”‚   β”œβ”€β”€ javaparser                    # JavaParser-based AST context extractor
β”‚   β”‚   └── method-line-extractor
β”‚   └── patches                       # Raw `.patch` files for bugs
β”‚
β”œβ”€β”€ birch
β”‚   β”œβ”€β”€ README.md                     # LLM-only repair workflow instructions
β”‚   β”œβ”€β”€ llm                           # LLM API wrappers and model definitions
β”‚   β”œβ”€β”€ prompt_configurations         # Prompt templates (e.g., `prompts.toml`)
β”‚   β”œβ”€β”€ prompts                       # Python prompt generators
β”‚   β”œβ”€β”€ utils                         # Defects4J and LLM helper utilities
β”‚   └── scripts …                     # Scripts for checkout, repair, validation, and result summarization
β”‚
β”œβ”€β”€ redwood
β”‚   β”œβ”€β”€ README.md                     # Augmented-technique workflow instructions
β”‚   β”œβ”€β”€ algorithms                    # Similar-example retrieval, AST/embedding algorithms, etc.
β”‚   β”œβ”€β”€ hunk4j_statistics             # Scripts & CSVs for multi-hunk descriptive statistics
β”‚   β”œβ”€β”€ hunk_divergence               # Divergence computation & analysis (Python, R, plots, CSVs)
β”‚   β”œβ”€β”€ proximity_class               # Spatial-proximity classification tools & plots
β”‚   β”œβ”€β”€ prompt_configurations         # TOML configs for feedback/retrieval prompts
β”‚   β”œβ”€β”€ prompts                       # Python modules for compiler-error & similar-result prompts
β”‚   β”œβ”€β”€ results                       # Results for all experiments with `passed_bugs.json` and summaries
β”‚   β”œβ”€β”€ solved_bugs_statistics        # CSV reports of which bugs each LLM solved (per scope)
β”‚   └── utils                         # Feedback-loop and general utilities
β”‚
β”œβ”€β”€ birch-llm-prompting -> birch # symlink (LLM-only workflow)
β”œβ”€β”€ birch-augmented-prompting -> redwood # symlink (augmented workflow)
|
β”œβ”€β”€ images                            # Repository-wide image assets (e.g., birch-image.png)
└── README.md                         # (this file)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •