Skip to content

HugoGoHe/SWE-Smith-DataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SWE-Smith-DataScience

ChatGPT Image 15 may 2025, 12_46_16 p m

This project fine-tunes an open-source AI model to specialize in Data Science tasks using a portion of the SWE-Smith dataset.

Table of Contents

Overview

SWE-Smith-DataScience provides a modular pipeline to fine-tune large language models on specialized data science problems extracted from the SWE-Smith dataset and evaluate them via the SWE-agent framework.

Features

  • Model Fine-Tuning: Leverage finetune.py to train on data subsets.
  • Automated Evaluation: Use the SWE-agent module to benchmark performance.
  • Reproducible Workflows: Modular scripts for data processing and training.

Prerequisites

  • Python 3.8 or higher
  • Git
  • (Optional) Modal CLI for scalable remote training
  • Required Python packages:
    pip install torch transformers datasets modal
    

Installation

git clone https://github.com/HugoGoHe/SWE-Smith-DataScience.git
cd SWE-Smith-DataScience

Project Structure

SWE-Smith-DataScience/
├── SWE-agent/           # Evaluation agent and benchmarking scripts
├── finetune.py          # Fine-tuning script for the model
├── requirements.txt     # Python dependencies (optional)
├── LICENSE              # MIT License
└── README.md            # Project documentation

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •