Skip to content

FeatureForge LLM is a Python package that leverages large language models (LLMs) to automate and enhance feature engineering processes. By utilizing advanced AI capabilities, this package helps data scientists and machine learning engineers discover, generate, and implement intelligent features across various datasets.

Notifications You must be signed in to change notification settings

cgxjdzz/FeatureForge-LLM

Repository files navigation

FeatureForge LLM: Automated Feature Engineering with Large Language Models

Overview

FeatureForge LLM is an innovative Python package that leverages large language models (LLMs) to automate and enhance feature engineering processes. By utilizing advanced AI capabilities, this package helps data scientists and machine learning engineers discover, generate, and implement intelligent features across various datasets.

Key Features

  • 🤖 AI-Powered Feature Suggestions: Generate feature engineering recommendations using state-of-the-art language models
  • 🛠️ Automatic Feature Implementation: Automatically convert feature suggestions into executable Python code
  • 🔒 Safe Code Execution: Built-in safety checks to ensure secure feature generation
  • 📊 Multi-Provider Support: Compatible with multiple LLM providers like OpenAI and Google Gemini
  • 📈 Performance Benchmarking: Analyze feature implementation performance and impact

Installation

pip install featureforge-llm

Quick Start

from featureforge_llm import LLMFeaturePipeline

# Initialize the pipeline with your LLM provider
pipeline = LLMFeaturePipeline(
    llm_api_key="YOUR_API_KEY", 
    provider="gemini",  # or "openai"
    model="gemini-1.5-flash",
    verbose=True
)

# Define your task description and dataset background
task_description = (
    "Predict the disease state of patients with liver cirrhosis. "
    "The objective is to use multi-classification methods to predict "
    "the final disease status of patients."
)

dataset_background = (
    "This dataset contains various physiological indicators "
    "and treatment plans for liver cirrhosis patients."
)

# Get feature suggestions
suggestions = pipeline.ask_for_feature_suggestions(
    df=train_data,
    task_description=task_description,
    target_column="Status",
    dataset_background=dataset_background
)

# Implement all suggested features
result_df = pipeline.implement_all_suggestions(train_data)

# Implement transformations to test data 
test_result_df = pipeline.apply_saved_transformations(test_data)

Advanced Usage

Custom Feature Request

# Create a custom feature with a natural language description
custom_feature_df = pipeline.custom_feature_request(
    df=train_data, 
    feature_description="Create an interaction feature between patient age and treatment duration"
)

Performance Benchmarking

# Benchmark a specific feature implementation
benchmark_results = pipeline.benchmark_feature_implementation(
    df=train_data, 
    suggestion_id="your_suggestion_id", 
    iterations=5
)

Supported LLM Providers

  • OpenAI (GPT models)
  • Google Gemini
  • More providers coming soon!

Dependencies

  • pandas
  • numpy
  • openai (optional)
  • google-generativeai (optional)

Configuration

  • llm_api_key: Your API key for the selected LLM provider
  • provider: "openai" or "gemini"
  • model: Specific model to use (e.g., "gpt-4", "gemini-1.5-flash")
  • verbose: Enable detailed logging (default: True)

Contributing

Contributions are welcome! Please check our GitHub repository for guidelines.

License

MIT License

Disclaimer

Feature suggestions are generated by AI and should be carefully reviewed by domain experts before implementation.

Citation

If you use FeatureForge LLM in your work, please cite it as follows:

@software{FeatureForgeLLM,
  author = {Feifan Zhang},
  title = {FeatureForge LLM: Automated Feature Engineering with Large Language Models},
  year = {2024},
  version = {1.0.0},
  url = {https://github.com/cgxjdzz/FeatureForge-LLM},
  note = {Python package}
}

About

FeatureForge LLM is a Python package that leverages large language models (LLMs) to automate and enhance feature engineering processes. By utilizing advanced AI capabilities, this package helps data scientists and machine learning engineers discover, generate, and implement intelligent features across various datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published