SWE-Smith-DataScience

This project fine-tunes an open-source AI model to specialize in Data Science tasks using a portion of the SWE-Smith dataset.

Overview

SWE-Smith-DataScience provides a modular pipeline to fine-tune large language models on specialized data science problems extracted from the SWE-Smith dataset and evaluate them via the SWE-agent framework.

Features

Model Fine-Tuning: Leverage finetune.py to train on data subsets.
Automated Evaluation: Use the SWE-agent module to benchmark performance.
Reproducible Workflows: Modular scripts for data processing and training.

Prerequisites

Python 3.8 or higher
Git
(Optional) Modal CLI for scalable remote training

Required Python packages:

pip install torch transformers datasets modal

Installation

git clone https://github.com/HugoGoHe/SWE-Smith-DataScience.git
cd SWE-Smith-DataScience

Project Structure

SWE-Smith-DataScience/
├── SWE-agent/           # Evaluation agent and benchmarking scripts
├── finetune.py          # Fine-tuning script for the model
├── requirements.txt     # Python dependencies (optional)
├── LICENSE              # MIT License
└── README.md            # Project documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWE-Smith-DataScience

Table of Contents

Overview

Features

Prerequisites

Installation

Project Structure

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
SWE-agent		SWE-agent
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
finetune.py		finetune.py

License

HugoGoHe/SWE-Smith-DataScience

Folders and files

Latest commit

History

Repository files navigation

SWE-Smith-DataScience

Table of Contents

Overview

Features

Prerequisites

Installation

Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages