FeatureForge LLM is an innovative Python package that leverages large language models (LLMs) to automate and enhance feature engineering processes. By utilizing advanced AI capabilities, this package helps data scientists and machine learning engineers discover, generate, and implement intelligent features across various datasets.
- 🤖 AI-Powered Feature Suggestions: Generate feature engineering recommendations using state-of-the-art language models
- 🛠️ Automatic Feature Implementation: Automatically convert feature suggestions into executable Python code
- 🔒 Safe Code Execution: Built-in safety checks to ensure secure feature generation
- 📊 Multi-Provider Support: Compatible with multiple LLM providers like OpenAI and Google Gemini
- 📈 Performance Benchmarking: Analyze feature implementation performance and impact
pip install featureforge-llmfrom featureforge_llm import LLMFeaturePipeline
# Initialize the pipeline with your LLM provider
pipeline = LLMFeaturePipeline(
llm_api_key="YOUR_API_KEY",
provider="gemini", # or "openai"
model="gemini-1.5-flash",
verbose=True
)
# Define your task description and dataset background
task_description = (
"Predict the disease state of patients with liver cirrhosis. "
"The objective is to use multi-classification methods to predict "
"the final disease status of patients."
)
dataset_background = (
"This dataset contains various physiological indicators "
"and treatment plans for liver cirrhosis patients."
)
# Get feature suggestions
suggestions = pipeline.ask_for_feature_suggestions(
df=train_data,
task_description=task_description,
target_column="Status",
dataset_background=dataset_background
)
# Implement all suggested features
result_df = pipeline.implement_all_suggestions(train_data)
# Implement transformations to test data
test_result_df = pipeline.apply_saved_transformations(test_data)# Create a custom feature with a natural language description
custom_feature_df = pipeline.custom_feature_request(
df=train_data,
feature_description="Create an interaction feature between patient age and treatment duration"
)# Benchmark a specific feature implementation
benchmark_results = pipeline.benchmark_feature_implementation(
df=train_data,
suggestion_id="your_suggestion_id",
iterations=5
)- OpenAI (GPT models)
- Google Gemini
- More providers coming soon!
- pandas
- numpy
- openai (optional)
- google-generativeai (optional)
llm_api_key: Your API key for the selected LLM providerprovider: "openai" or "gemini"model: Specific model to use (e.g., "gpt-4", "gemini-1.5-flash")verbose: Enable detailed logging (default: True)
Contributions are welcome! Please check our GitHub repository for guidelines.
MIT License
Feature suggestions are generated by AI and should be carefully reviewed by domain experts before implementation.
If you use FeatureForge LLM in your work, please cite it as follows:
@software{FeatureForgeLLM,
author = {Feifan Zhang},
title = {FeatureForge LLM: Automated Feature Engineering with Large Language Models},
year = {2024},
version = {1.0.0},
url = {https://github.com/cgxjdzz/FeatureForge-LLM},
note = {Python package}
}