Reliable structured output from LLMs with deterministic generation
PromptShift is a Python library that provides a simple, reliable way to get structured outputs from Large Language Models (LLMs). Built on top of Pydantic for schema validation and designed for deterministic, production-ready results.
- 🎯 Deterministic Output - Consistent, reproducible results with temperature=0.0 and deterministic sampling
- 🔒 Type-Safe Schemas - Define output structure using Pydantic models with full type safety
- ⚡ Fast & Lightweight - Minimal dependencies (only Pydantic and HTTPX), optimized for performance
- 🔄 Automatic Retries - Smart retry logic with exponential backoff for validation errors
- 💾 Built-in Caching - LRU cache for identical prompts to reduce API calls
- 🎨 Multi-Provider Support - Works with Groq and other OpenAI-compatible APIs
- 📊 Comprehensive Error Handling - Detailed error context for debugging and monitoring
Coming soon - will be available via pip:
pip install PromptShiftFor detailed installation instructions and development setup, see the full documentation.
from PromptShift import Client
from pydantic import BaseModel
# Define your output schema
class Person(BaseModel):
name: str
age: int
occupation: str
# Initialize client
client = Client(provider="groq", model="llama-3.1-8b-instant")
# Generate structured output
result = client.generate(
prompt="Describe Alice, a 30-year-old software engineer",
schema=Person
)
print(f"{result.name} is {result.age} years old and works as a {result.occupation}")
# Output: Alice is 30 years old and works as a software engineerPromptShift provides detailed logging to help you debug and monitor LLM interactions. Retry attempts are logged at INFO level by default, making them visible without additional configuration.
When retry attempts occur, PromptShift logs the following information at INFO level:
Successful attempt example:
INFO - Attempt 1 (original)
INFO - Cleaned LLM output:
{
"name": "Alice",
"age": 30
}
INFO - Valid? Yes
Failed attempt example:
INFO - Attempt 1 (original)
INFO - Cleaned LLM output:
{
"name": "Bob",
"age": "thirty"
}
INFO - Valid? No
INFO - Validation errors:
- age: Input should be a valid integer
Final failure (after all retries exhausted):
ERROR - All 4 retry attempts exhausted for schema Person
[Full exception with stack trace follows]
You can customize logging behavior using Python's standard logging module:
import logging
# Show all logs (including DEBUG level internal details)
logging.basicConfig(level=logging.DEBUG)
# Show only INFO and above (default - includes retry attempts)
logging.basicConfig(level=logging.INFO)
# Show only warnings and errors (hides retry attempt details)
logging.basicConfig(level=logging.WARNING)
# Disable all logging
logging.basicConfig(level=logging.CRITICAL)To configure only PromptShift logs without affecting other libraries:
import logging
# Get PromptShift logger
pl_logger = logging.getLogger("PromptShift")
# Set level for PromptShift only
pl_logger.setLevel(logging.DEBUG)
# Add custom handler
handler = logging.FileHandler("PromptShift.log")
handler.setFormatter(logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))
pl_logger.addHandler(handler)- DEBUG: Internal state, detailed diagnostics, prompt enhancement details
- INFO: Retry attempts, LLM outputs, validation results (visible by default)
- WARNING: Recoverable issues (currently unused in retry logic)
- ERROR: Final retry exhaustion, unrecoverable failures
- CRITICAL: Configuration errors (currently unused)
import logging
from PromptShift import Client
# Production configuration - only errors and critical issues
logging.basicConfig(
level=logging.ERROR,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
handlers=[
logging.FileHandler("app.log"),
logging.StreamHandler()
]
)
# Use client normally - retry attempts won't appear in logs
client = Client(provider="groq", model="llama-3.1-8b-instant")
result = client.generate("Generate data", MySchema)import logging
from PromptShift import Client
# Development configuration - see everything
logging.basicConfig(
level=logging.DEBUG,
format="%(asctime)s - %(name)s - %(levelname)s - %(funcName)s - %(message)s"
)
# Use client - all retry details, prompts, and internal state visible
client = Client(provider="groq", model="llama-3.1-8b-instant")
result = client.generate("Generate data", MySchema)PromptShift is designed for 100% reproducible outputs to enable reliable testing, debugging, and production deployments. Every generation is deterministic by default.
- Testing: Write reliable unit tests that verify exact LLM outputs
- Debugging: Reproduce issues consistently to identify root causes
- Reproducibility: Same inputs always produce same outputs (within model version)
- Consistency: Eliminate randomness from production workflows
PromptShift achieves deterministic outputs through three mechanisms:
- Temperature = 0.0: Always uses temperature 0.0 for deterministic sampling
- Automatic Seed Generation: Creates deterministic seed from hash(prompt + JSON schema)
- Consistent Hashing: Same prompt and schema always produce same seed
from PromptShift import Client
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
client = Client(provider="groq", model="llama-3.1-8b-instant")
# Generate multiple times - same output every time!
result1 = client.generate("Generate Alice, age 30", Person)
result2 = client.generate("Generate Alice, age 30", Person)
result3 = client.generate("Generate Alice, age 30", Person)
assert result1.name == result2.name == result3.name
assert result1.age == result2.age == result3.age
# All assertions pass - 100% deterministic!For testing scenarios where you need guaranteed reproducibility, you can provide an explicit seed:
# Use explicit seed for test fixtures
def test_person_generation():
client = Client(provider="groq", model="llama-3.1-8b-instant")
# Same seed = same output, even across test runs
result = client.generate(
prompt="Generate a person",
schema=Person,
seed=12345 # Explicit seed for reproducibility
)
assert result.name == "Alice" # Predictable output
assert result.age == 30✅ Guaranteed deterministic when:
- Same prompt text
- Same Pydantic schema definition
- Same model version (e.g.,
llama-3.1-8b-instant) - Same provider infrastructure
- Model version is updated by provider (e.g.,
llama-3.1→llama-3.2) - Provider infrastructure changes
- Schema definition changes (field names, types, order)
- Prompt text changes (even whitespace differences)
💡 Best Practice: Pin your model versions in production and use explicit seeds in tests for maximum reproducibility.
For more details on seed parameter and determinism options, see the API Reference.
Groq API Seed Documentation: Groq API Docs
PromptShift is designed for minimal overhead - the library adds less than 1ms to your requests (excluding the actual LLM API call).
Library Overhead (excluding LLM API latency):
- Total overhead: <1ms (~0.18ms average)
- Cache key generation: <1ms (~60µs average)
- Schema validation: <10ms for typical schemas (~1-3µs average)
- Cache lookup: <10ms (~185ns average)
- Memory usage: <50MB for 100 cached responses
Caching Performance Benefits:
- Cache hit: Returns instantly (~185ns lookup time)
- Cache miss: Normal generation + validation (~0.18ms overhead)
- Speedup: 50-200x faster for repeated requests
from PromptShift import Client
from pydantic import BaseModel
import time
class Person(BaseModel):
name: str
age: int
client = Client(provider="groq", model="llama-3.1-8b-instant")
# First call - cache miss (includes LLM API call ~500-2000ms)
start = time.time()
result1 = client.generate("Generate Alice, age 30", Person)
first_call = time.time() - start
print(f"First call (cache miss): {first_call*1000:.1f}ms")
# Second call - cache hit (instant return)
start = time.time()
result2 = client.generate("Generate Alice, age 30", Person)
cached_call = time.time() - start
print(f"Second call (cache hit): {cached_call*1000:.1f}ms")
print(f"Speedup: {first_call/cached_call:.0f}x faster")
# Output:
# First call (cache miss): 523.4ms
# Second call (cache hit): 0.2ms
# Speedup: 2617x fasterFor cases where you want fresh results each time (e.g., creative content generation):
# Disable cache for this request
result = client.generate(
prompt="Generate a random story",
schema=Story,
use_cache=False # Skip cache, always call LLM
)Run the included benchmarks to verify performance on your system:
# Run all performance benchmarks
uv run pytest tests/performance/test_benchmarks.py --benchmark-only
# Run memory profiling
uv run pytest tests/performance/test_memory.py --memrayFor detailed performance analysis, see docs/performance.md.
- Python 3.9 or higher (development on 3.12 recommended)
- uv package manager
- Clone the repository:
git clone https://github.com/aritroCoder/PromptShift.git
cd PromptShift- Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh- Create virtual environment and install dependencies:
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev,test,docs]"- Install pre-commit hooks:
uv run pre-commit install- Run tests to verify setup:
uv run pytest# Run all tests (unit tests only, skips integration)
uv run pytest
# Run unit tests only
uv run pytest tests/unit/
# Run tests with coverage
uv run pytest --cov=src/PromptShift --cov-report=html
# Format code
uv run black src/ tests/
# Lint code
uv run ruff check src/ tests/
# Type check
uv run mypy src/
# Run all checks (same as CI)
uv run pre-commit run --all-filesIntegration tests validate error handling and API behavior with real Groq API calls. These tests are optional and require a valid API key.
Setup:
- Get a Groq API key from https://console.groq.com
- Set the environment variable:
export GROQ_API_KEY=your_api_key_here
Run integration tests:
# Run all integration tests
GROQ_API_KEY=your_key uv run pytest tests/integration/ -m integration -v
# Run specific integration test file
GROQ_API_KEY=your_key uv run pytest tests/integration/test_error_scenarios.py -m integration -v
# Run specific test function
GROQ_API_KEY=your_key uv run pytest tests/integration/test_error_scenarios.py::test_invalid_api_key_raises_authentication_error -m integration -vSkip integration tests:
# Run only unit tests, skip integration tests
uv run pytest -m "not integration"Run github actions locally
act -j test -W .github/workflows/test.ymlRun brew install act if act is not installed
Important Notes:
- Integration tests make real API calls and consume API quota
- Tests will be skipped automatically if
GROQ_API_KEYis not set - Rate limits may affect test execution if run frequently
- CI runs integration tests automatically if the API key secret is configured (non-blocking)
Full documentation is available at docs/index.md.
Build and view documentation locally:
# Install documentation dependencies
uv pip install -e ".[docs]"
# Build documentation
mkdocs build
# Serve locally
mkdocs serve
# Open http://127.0.0.1:8000 in your browserContributions are welcome! Please see CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- Epic 1: Foundation & Core Client API
- Epic 2: Validation, Error Handling & Retry Logic
- Epic 3: Determinism & Caching
- Epic 4: Documentation, Examples & Polish
- 📫 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions