FeatureForge LLM: Automated Feature Engineering with Large Language Models

Overview

FeatureForge LLM is an innovative Python package that leverages large language models (LLMs) to automate and enhance feature engineering processes. By utilizing advanced AI capabilities, this package helps data scientists and machine learning engineers discover, generate, and implement intelligent features across various datasets.

Key Features

🤖 AI-Powered Feature Suggestions: Generate feature engineering recommendations using state-of-the-art language models
🛠️ Automatic Feature Implementation: Automatically convert feature suggestions into executable Python code
🔒 Safe Code Execution: Built-in safety checks to ensure secure feature generation
📊 Multi-Provider Support: Compatible with multiple LLM providers like OpenAI and Google Gemini
📈 Performance Benchmarking: Analyze feature implementation performance and impact

Installation

pip install featureforge-llm

Quick Start

from featureforge_llm import LLMFeaturePipeline

# Initialize the pipeline with your LLM provider
pipeline = LLMFeaturePipeline(
    llm_api_key="YOUR_API_KEY", 
    provider="gemini",  # or "openai"
    model="gemini-1.5-flash",
    verbose=True
)

# Define your task description and dataset background
task_description = (
    "Predict the disease state of patients with liver cirrhosis. "
    "The objective is to use multi-classification methods to predict "
    "the final disease status of patients."
)

dataset_background = (
    "This dataset contains various physiological indicators "
    "and treatment plans for liver cirrhosis patients."
)

# Get feature suggestions
suggestions = pipeline.ask_for_feature_suggestions(
    df=train_data,
    task_description=task_description,
    target_column="Status",
    dataset_background=dataset_background
)

# Implement all suggested features
result_df = pipeline.implement_all_suggestions(train_data)

# Implement transformations to test data 
test_result_df = pipeline.apply_saved_transformations(test_data)

Advanced Usage

Custom Feature Request

# Create a custom feature with a natural language description
custom_feature_df = pipeline.custom_feature_request(
    df=train_data, 
    feature_description="Create an interaction feature between patient age and treatment duration"
)

Performance Benchmarking

# Benchmark a specific feature implementation
benchmark_results = pipeline.benchmark_feature_implementation(
    df=train_data, 
    suggestion_id="your_suggestion_id", 
    iterations=5
)

Supported LLM Providers

OpenAI (GPT models)
Google Gemini
More providers coming soon!

Dependencies

pandas
numpy
openai (optional)
google-generativeai (optional)

Configuration

llm_api_key: Your API key for the selected LLM provider
provider: "openai" or "gemini"
model: Specific model to use (e.g., "gpt-4", "gemini-1.5-flash")
verbose: Enable detailed logging (default: True)

Contributing

Contributions are welcome! Please check our GitHub repository for guidelines.

License

MIT License

Disclaimer

Feature suggestions are generated by AI and should be carefully reviewed by domain experts before implementation.

Citation

If you use FeatureForge LLM in your work, please cite it as follows:

@software{FeatureForgeLLM,
  author = {Feifan Zhang},
  title = {FeatureForge LLM: Automated Feature Engineering with Large Language Models},
  year = {2024},
  version = {1.0.0},
  url = {https://github.com/cgxjdzz/FeatureForge-LLM},
  note = {Python package}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dist		dist
featureforge_llm.egg-info		featureforge_llm.egg-info
featureforge_llm		featureforge_llm
README.md		README.md
first_test.ipynb		first_test.ipynb
pyproject.toml		pyproject.toml
setup.py		setup.py
test_eng.pkl		test_eng.pkl
train_eng.pkl		train_eng.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FeatureForge LLM: Automated Feature Engineering with Large Language Models

Overview

Key Features

Installation

Quick Start

Advanced Usage

Custom Feature Request

Performance Benchmarking

Supported LLM Providers

Dependencies

Configuration

Contributing

License

Disclaimer

Citation

About

Uh oh!

Releases

Packages

Languages

cgxjdzz/FeatureForge-LLM

Folders and files

Latest commit

History

Repository files navigation

FeatureForge LLM: Automated Feature Engineering with Large Language Models

Overview

Key Features

Installation

Quick Start

Advanced Usage

Custom Feature Request

Performance Benchmarking

Supported LLM Providers

Dependencies

Configuration

Contributing

License

Disclaimer

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages