This project enables experiments with large language models (LLMs) for classification tasks. It supports processing data using predefined configurations, handling multiple model setups, and generating evaluation reports.
-
Run Experiments with Configurable Inputs:
Run classification tasks using CSV input files and configuration settings. -
Support for Investigator and Model Modes:
- Investigator Mode: Execute experiments for a specific investigator using predefined configurations.
- Models Mode: Execute experiments for multiple models with their respective configurations.
-
Generative Model Integration:
Utilizes LLMs for predictions with user-defined prompts. -
Partial Result Handling:
Saves intermediate results to prevent data loss during lengthy executions. -
Evaluation Metrics:
Includes evaluation functionality such as edit distance analysis for classification performance.
- Clone the repository:
git clone https://github.com/diverso-lab/ConfigurationLLMClassificator cd ConfigurationLLMClassificator - Install dependencies:
pip install -r requirements.txt
-
Investigator Mode:
Execute experiments for a specific investigator using their configuration:python main.py --mode i --investigator investigatorName
-
Models Mode:
Run experiments for multiple models, optionally filtering by specific model names:python main.py --mode models --models model1 model2
A JSON file (e.g., configs/investigatorName_config.json) defines the settings for a single investigator:
{
"csv_path": "path/to/data.csv",
"model": "model_name",
"system_prompt": "Define classification prompt",
"max_tokens": 256,
"temperature": 1,
"true_column": "class"
}A JSON file (e.g., configs/models_config.json) contains settings for multiple models:
[
{
"csv_path": "path/to/data1.csv",
"model": "model1",
"system_prompt": "Define prompt",
"max_tokens": 256,
"temperature": 1,
"true_column": "class"
},
{
"csv_path": "path/to/data2.csv",
"model": "model2",
"system_prompt": "Define another prompt",
"max_tokens": 512,
"temperature": 1,
"true_column": "label"
}
]-
Results Directory:
Results are saved in theoutput/directory with a unique hash based on the configuration. -
Files:
config.csv: Saves the configuration used for this experiment.results.csv: Predicted labels for each instance.report.csv: Performance metrics and evaluation results.