Skip to content

aritroCoder/promptshift

Repository files navigation

PromptShift

CI Status Coverage PyPI version Python 3.9+ License: MIT

Reliable structured output from LLMs with deterministic generation

PromptShift is a Python library that provides a simple, reliable way to get structured outputs from Large Language Models (LLMs). Built on top of Pydantic for schema validation and designed for deterministic, production-ready results.

Features

  • 🎯 Deterministic Output - Consistent, reproducible results with temperature=0.0 and deterministic sampling
  • 🔒 Type-Safe Schemas - Define output structure using Pydantic models with full type safety
  • Fast & Lightweight - Minimal dependencies (only Pydantic and HTTPX), optimized for performance
  • 🔄 Automatic Retries - Smart retry logic with exponential backoff for validation errors
  • 💾 Built-in Caching - LRU cache for identical prompts to reduce API calls
  • 🎨 Multi-Provider Support - Works with Groq and other OpenAI-compatible APIs
  • 📊 Comprehensive Error Handling - Detailed error context for debugging and monitoring

Installation

Coming soon - will be available via pip:

pip install PromptShift

For detailed installation instructions and development setup, see the full documentation.

Quick Start

from PromptShift import Client
from pydantic import BaseModel

# Define your output schema
class Person(BaseModel):
    name: str
    age: int
    occupation: str

# Initialize client
client = Client(provider="groq", model="llama-3.1-8b-instant")

# Generate structured output
result = client.generate(
    prompt="Describe Alice, a 30-year-old software engineer",
    schema=Person
)

print(f"{result.name} is {result.age} years old and works as a {result.occupation}")
# Output: Alice is 30 years old and works as a software engineer

Logging

PromptShift provides detailed logging to help you debug and monitor LLM interactions. Retry attempts are logged at INFO level by default, making them visible without additional configuration.

Default Logging Behavior

When retry attempts occur, PromptShift logs the following information at INFO level:

Successful attempt example:

INFO - Attempt 1 (original)
INFO - Cleaned LLM output:
{
  "name": "Alice",
  "age": 30
}
INFO - Valid? Yes

Failed attempt example:

INFO - Attempt 1 (original)
INFO - Cleaned LLM output:
{
  "name": "Bob",
  "age": "thirty"
}
INFO - Valid? No
INFO - Validation errors:
  - age: Input should be a valid integer

Final failure (after all retries exhausted):

ERROR - All 4 retry attempts exhausted for schema Person
[Full exception with stack trace follows]

Configuring Logging Levels

You can customize logging behavior using Python's standard logging module:

import logging

# Show all logs (including DEBUG level internal details)
logging.basicConfig(level=logging.DEBUG)

# Show only INFO and above (default - includes retry attempts)
logging.basicConfig(level=logging.INFO)

# Show only warnings and errors (hides retry attempt details)
logging.basicConfig(level=logging.WARNING)

# Disable all logging
logging.basicConfig(level=logging.CRITICAL)

Library-Specific Logger

To configure only PromptShift logs without affecting other libraries:

import logging

# Get PromptShift logger
pl_logger = logging.getLogger("PromptShift")

# Set level for PromptShift only
pl_logger.setLevel(logging.DEBUG)

# Add custom handler
handler = logging.FileHandler("PromptShift.log")
handler.setFormatter(logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))
pl_logger.addHandler(handler)

Log Levels Used

  • DEBUG: Internal state, detailed diagnostics, prompt enhancement details
  • INFO: Retry attempts, LLM outputs, validation results (visible by default)
  • WARNING: Recoverable issues (currently unused in retry logic)
  • ERROR: Final retry exhaustion, unrecoverable failures
  • CRITICAL: Configuration errors (currently unused)

Example: Production Logging Setup

import logging
from PromptShift import Client

# Production configuration - only errors and critical issues
logging.basicConfig(
    level=logging.ERROR,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("app.log"),
        logging.StreamHandler()
    ]
)

# Use client normally - retry attempts won't appear in logs
client = Client(provider="groq", model="llama-3.1-8b-instant")
result = client.generate("Generate data", MySchema)

Example: Development/Debug Logging Setup

import logging
from PromptShift import Client

# Development configuration - see everything
logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s - %(name)s - %(levelname)s - %(funcName)s - %(message)s"
)

# Use client - all retry details, prompts, and internal state visible
client = Client(provider="groq", model="llama-3.1-8b-instant")
result = client.generate("Generate data", MySchema)

Deterministic Generation

PromptShift is designed for 100% reproducible outputs to enable reliable testing, debugging, and production deployments. Every generation is deterministic by default.

Why Determinism Matters

  • Testing: Write reliable unit tests that verify exact LLM outputs
  • Debugging: Reproduce issues consistently to identify root causes
  • Reproducibility: Same inputs always produce same outputs (within model version)
  • Consistency: Eliminate randomness from production workflows

How Determinism Works

PromptShift achieves deterministic outputs through three mechanisms:

  1. Temperature = 0.0: Always uses temperature 0.0 for deterministic sampling
  2. Automatic Seed Generation: Creates deterministic seed from hash(prompt + JSON schema)
  3. Consistent Hashing: Same prompt and schema always produce same seed

Basic Deterministic Usage

from PromptShift import Client
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

client = Client(provider="groq", model="llama-3.1-8b-instant")

# Generate multiple times - same output every time!
result1 = client.generate("Generate Alice, age 30", Person)
result2 = client.generate("Generate Alice, age 30", Person)
result3 = client.generate("Generate Alice, age 30", Person)

assert result1.name == result2.name == result3.name
assert result1.age == result2.age == result3.age
# All assertions pass - 100% deterministic!

Explicit Seed for Testing

For testing scenarios where you need guaranteed reproducibility, you can provide an explicit seed:

# Use explicit seed for test fixtures
def test_person_generation():
    client = Client(provider="groq", model="llama-3.1-8b-instant")

    # Same seed = same output, even across test runs
    result = client.generate(
        prompt="Generate a person",
        schema=Person,
        seed=12345  # Explicit seed for reproducibility
    )

    assert result.name == "Alice"  # Predictable output
    assert result.age == 30

Determinism Guarantees and Limitations

✅ Guaranteed deterministic when:

  • Same prompt text
  • Same Pydantic schema definition
  • Same model version (e.g., llama-3.1-8b-instant)
  • Same provider infrastructure

⚠️ May change when:

  • Model version is updated by provider (e.g., llama-3.1llama-3.2)
  • Provider infrastructure changes
  • Schema definition changes (field names, types, order)
  • Prompt text changes (even whitespace differences)

💡 Best Practice: Pin your model versions in production and use explicit seeds in tests for maximum reproducibility.

API Documentation

For more details on seed parameter and determinism options, see the API Reference.

Groq API Seed Documentation: Groq API Docs

Performance

PromptShift is designed for minimal overhead - the library adds less than 1ms to your requests (excluding the actual LLM API call).

Performance Characteristics

Library Overhead (excluding LLM API latency):

  • Total overhead: <1ms (~0.18ms average)
  • Cache key generation: <1ms (~60µs average)
  • Schema validation: <10ms for typical schemas (~1-3µs average)
  • Cache lookup: <10ms (~185ns average)
  • Memory usage: <50MB for 100 cached responses

Caching Performance Benefits:

  • Cache hit: Returns instantly (~185ns lookup time)
  • Cache miss: Normal generation + validation (~0.18ms overhead)
  • Speedup: 50-200x faster for repeated requests

Performance Example

from PromptShift import Client
from pydantic import BaseModel
import time

class Person(BaseModel):
    name: str
    age: int

client = Client(provider="groq", model="llama-3.1-8b-instant")

# First call - cache miss (includes LLM API call ~500-2000ms)
start = time.time()
result1 = client.generate("Generate Alice, age 30", Person)
first_call = time.time() - start
print(f"First call (cache miss): {first_call*1000:.1f}ms")

# Second call - cache hit (instant return)
start = time.time()
result2 = client.generate("Generate Alice, age 30", Person)
cached_call = time.time() - start
print(f"Second call (cache hit): {cached_call*1000:.1f}ms")

print(f"Speedup: {first_call/cached_call:.0f}x faster")
# Output:
# First call (cache miss): 523.4ms
# Second call (cache hit): 0.2ms
# Speedup: 2617x faster

Bypassing Cache for Non-Deterministic Scenarios

For cases where you want fresh results each time (e.g., creative content generation):

# Disable cache for this request
result = client.generate(
    prompt="Generate a random story",
    schema=Story,
    use_cache=False  # Skip cache, always call LLM
)

Performance Benchmarks

Run the included benchmarks to verify performance on your system:

# Run all performance benchmarks
uv run pytest tests/performance/test_benchmarks.py --benchmark-only

# Run memory profiling
uv run pytest tests/performance/test_memory.py --memray

For detailed performance analysis, see docs/performance.md.

Development Setup

Prerequisites

  • Python 3.9 or higher (development on 3.12 recommended)
  • uv package manager

Installation

  1. Clone the repository:
git clone https://github.com/aritroCoder/PromptShift.git
cd PromptShift
  1. Install uv (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Create virtual environment and install dependencies:
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev,test,docs]"
  1. Install pre-commit hooks:
uv run pre-commit install
  1. Run tests to verify setup:
uv run pytest

Development Commands

# Run all tests (unit tests only, skips integration)
uv run pytest

# Run unit tests only
uv run pytest tests/unit/

# Run tests with coverage
uv run pytest --cov=src/PromptShift --cov-report=html

# Format code
uv run black src/ tests/

# Lint code
uv run ruff check src/ tests/

# Type check
uv run mypy src/

# Run all checks (same as CI)
uv run pre-commit run --all-files

Running Integration Tests

Integration tests validate error handling and API behavior with real Groq API calls. These tests are optional and require a valid API key.

Setup:

  1. Get a Groq API key from https://console.groq.com
  2. Set the environment variable:
    export GROQ_API_KEY=your_api_key_here

Run integration tests:

# Run all integration tests
GROQ_API_KEY=your_key uv run pytest tests/integration/ -m integration -v

# Run specific integration test file
GROQ_API_KEY=your_key uv run pytest tests/integration/test_error_scenarios.py -m integration -v

# Run specific test function
GROQ_API_KEY=your_key uv run pytest tests/integration/test_error_scenarios.py::test_invalid_api_key_raises_authentication_error -m integration -v

Skip integration tests:

# Run only unit tests, skip integration tests
uv run pytest -m "not integration"

Run github actions locally

act -j test -W .github/workflows/test.yml

Run brew install act if act is not installed

Important Notes:

  • Integration tests make real API calls and consume API quota
  • Tests will be skipped automatically if GROQ_API_KEY is not set
  • Rate limits may affect test execution if run frequently
  • CI runs integration tests automatically if the API key secret is configured (non-blocking)

Documentation

Full documentation is available at docs/index.md.

Build and view documentation locally:

# Install documentation dependencies
uv pip install -e ".[docs]"

# Build documentation
mkdocs build

# Serve locally
mkdocs serve
# Open http://127.0.0.1:8000 in your browser

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Roadmap

  • Epic 1: Foundation & Core Client API
  • Epic 2: Validation, Error Handling & Retry Logic
  • Epic 3: Determinism & Caching
  • Epic 4: Documentation, Examples & Polish

Support

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages