PromptShift

Reliable structured output from LLMs with deterministic generation

PromptShift is a Python library that provides a simple, reliable way to get structured outputs from Large Language Models (LLMs). Built on top of Pydantic for schema validation and designed for deterministic, production-ready results.

Features

🎯 Deterministic Output - Consistent, reproducible results with temperature=0.0 and deterministic sampling
🔒 Type-Safe Schemas - Define output structure using Pydantic models with full type safety
⚡ Fast & Lightweight - Minimal dependencies (only Pydantic and HTTPX), optimized for performance
🔄 Automatic Retries - Smart retry logic with exponential backoff for validation errors
💾 Built-in Caching - LRU cache for identical prompts to reduce API calls
🎨 Multi-Provider Support - Works with Groq and other OpenAI-compatible APIs
📊 Comprehensive Error Handling - Detailed error context for debugging and monitoring

Installation

Coming soon - will be available via pip:

pip install PromptShift

For detailed installation instructions and development setup, see the full documentation.

Quick Start

from PromptShift import Client
from pydantic import BaseModel

# Define your output schema
class Person(BaseModel):
    name: str
    age: int
    occupation: str

# Initialize client
client = Client(provider="groq", model="llama-3.1-8b-instant")

# Generate structured output
result = client.generate(
    prompt="Describe Alice, a 30-year-old software engineer",
    schema=Person
)

print(f"{result.name} is {result.age} years old and works as a {result.occupation}")
# Output: Alice is 30 years old and works as a software engineer

Logging

PromptShift provides detailed logging to help you debug and monitor LLM interactions. Retry attempts are logged at INFO level by default, making them visible without additional configuration.

Default Logging Behavior

When retry attempts occur, PromptShift logs the following information at INFO level:

Successful attempt example:

INFO - Attempt 1 (original)
INFO - Cleaned LLM output:
{
  "name": "Alice",
  "age": 30
}
INFO - Valid? Yes

Failed attempt example:

INFO - Attempt 1 (original)
INFO - Cleaned LLM output:
{
  "name": "Bob",
  "age": "thirty"
}
INFO - Valid? No
INFO - Validation errors:
  - age: Input should be a valid integer

Final failure (after all retries exhausted):

ERROR - All 4 retry attempts exhausted for schema Person
[Full exception with stack trace follows]

Configuring Logging Levels

You can customize logging behavior using Python's standard logging module:

import logging

# Show all logs (including DEBUG level internal details)
logging.basicConfig(level=logging.DEBUG)

# Show only INFO and above (default - includes retry attempts)
logging.basicConfig(level=logging.INFO)

# Show only warnings and errors (hides retry attempt details)
logging.basicConfig(level=logging.WARNING)

# Disable all logging
logging.basicConfig(level=logging.CRITICAL)

Library-Specific Logger

To configure only PromptShift logs without affecting other libraries:

import logging

# Get PromptShift logger
pl_logger = logging.getLogger("PromptShift")

# Set level for PromptShift only
pl_logger.setLevel(logging.DEBUG)

# Add custom handler
handler = logging.FileHandler("PromptShift.log")
handler.setFormatter(logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s"))
pl_logger.addHandler(handler)

Log Levels Used

DEBUG: Internal state, detailed diagnostics, prompt enhancement details
INFO: Retry attempts, LLM outputs, validation results (visible by default)
WARNING: Recoverable issues (currently unused in retry logic)
ERROR: Final retry exhaustion, unrecoverable failures
CRITICAL: Configuration errors (currently unused)

Example: Production Logging Setup

import logging
from PromptShift import Client

# Production configuration - only errors and critical issues
logging.basicConfig(
    level=logging.ERROR,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler("app.log"),
        logging.StreamHandler()
    ]
)

# Use client normally - retry attempts won't appear in logs
client = Client(provider="groq", model="llama-3.1-8b-instant")
result = client.generate("Generate data", MySchema)

Example: Development/Debug Logging Setup

import logging
from PromptShift import Client

# Development configuration - see everything
logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s - %(name)s - %(levelname)s - %(funcName)s - %(message)s"
)

# Use client - all retry details, prompts, and internal state visible
client = Client(provider="groq", model="llama-3.1-8b-instant")
result = client.generate("Generate data", MySchema)

Deterministic Generation

PromptShift is designed for 100% reproducible outputs to enable reliable testing, debugging, and production deployments. Every generation is deterministic by default.

Why Determinism Matters

Testing: Write reliable unit tests that verify exact LLM outputs
Debugging: Reproduce issues consistently to identify root causes
Reproducibility: Same inputs always produce same outputs (within model version)
Consistency: Eliminate randomness from production workflows

How Determinism Works

PromptShift achieves deterministic outputs through three mechanisms:

Temperature = 0.0: Always uses temperature 0.0 for deterministic sampling
Automatic Seed Generation: Creates deterministic seed from hash(prompt + JSON schema)
Consistent Hashing: Same prompt and schema always produce same seed

Basic Deterministic Usage

from PromptShift import Client
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

client = Client(provider="groq", model="llama-3.1-8b-instant")

# Generate multiple times - same output every time!
result1 = client.generate("Generate Alice, age 30", Person)
result2 = client.generate("Generate Alice, age 30", Person)
result3 = client.generate("Generate Alice, age 30", Person)

assert result1.name == result2.name == result3.name
assert result1.age == result2.age == result3.age
# All assertions pass - 100% deterministic!

Explicit Seed for Testing

For testing scenarios where you need guaranteed reproducibility, you can provide an explicit seed:

# Use explicit seed for test fixtures
def test_person_generation():
    client = Client(provider="groq", model="llama-3.1-8b-instant")

    # Same seed = same output, even across test runs
    result = client.generate(
        prompt="Generate a person",
        schema=Person,
        seed=12345  # Explicit seed for reproducibility
    )

    assert result.name == "Alice"  # Predictable output
    assert result.age == 30

Determinism Guarantees and Limitations

✅ Guaranteed deterministic when:

Same prompt text
Same Pydantic schema definition
Same model version (e.g., llama-3.1-8b-instant)
Same provider infrastructure

⚠️ May change when:

Model version is updated by provider (e.g., llama-3.1 → llama-3.2)
Provider infrastructure changes
Schema definition changes (field names, types, order)
Prompt text changes (even whitespace differences)

💡 Best Practice: Pin your model versions in production and use explicit seeds in tests for maximum reproducibility.

API Documentation

For more details on seed parameter and determinism options, see the API Reference.

Groq API Seed Documentation: Groq API Docs

Performance

PromptShift is designed for minimal overhead - the library adds less than 1ms to your requests (excluding the actual LLM API call).

Performance Characteristics

Library Overhead (excluding LLM API latency):

Total overhead: <1ms (~0.18ms average)
Cache key generation: <1ms (~60µs average)
Schema validation: <10ms for typical schemas (~1-3µs average)
Cache lookup: <10ms (~185ns average)
Memory usage: <50MB for 100 cached responses

Caching Performance Benefits:

Cache hit: Returns instantly (~185ns lookup time)
Cache miss: Normal generation + validation (~0.18ms overhead)
Speedup: 50-200x faster for repeated requests

Performance Example

from PromptShift import Client
from pydantic import BaseModel
import time

class Person(BaseModel):
    name: str
    age: int

client = Client(provider="groq", model="llama-3.1-8b-instant")

# First call - cache miss (includes LLM API call ~500-2000ms)
start = time.time()
result1 = client.generate("Generate Alice, age 30", Person)
first_call = time.time() - start
print(f"First call (cache miss): {first_call*1000:.1f}ms")

# Second call - cache hit (instant return)
start = time.time()
result2 = client.generate("Generate Alice, age 30", Person)
cached_call = time.time() - start
print(f"Second call (cache hit): {cached_call*1000:.1f}ms")

print(f"Speedup: {first_call/cached_call:.0f}x faster")
# Output:
# First call (cache miss): 523.4ms
# Second call (cache hit): 0.2ms
# Speedup: 2617x faster

Bypassing Cache for Non-Deterministic Scenarios

For cases where you want fresh results each time (e.g., creative content generation):

# Disable cache for this request
result = client.generate(
    prompt="Generate a random story",
    schema=Story,
    use_cache=False  # Skip cache, always call LLM
)

Performance Benchmarks

Run the included benchmarks to verify performance on your system:

# Run all performance benchmarks
uv run pytest tests/performance/test_benchmarks.py --benchmark-only

# Run memory profiling
uv run pytest tests/performance/test_memory.py --memray

For detailed performance analysis, see docs/performance.md.

Development Setup

Prerequisites

Python 3.9 or higher (development on 3.12 recommended)
uv package manager

Installation

Clone the repository:

git clone https://github.com/aritroCoder/PromptShift.git
cd PromptShift

Install uv (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create virtual environment and install dependencies:

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev,test,docs]"

Install pre-commit hooks:

uv run pre-commit install

Run tests to verify setup:

uv run pytest

Development Commands

# Run all tests (unit tests only, skips integration)
uv run pytest

# Run unit tests only
uv run pytest tests/unit/

# Run tests with coverage
uv run pytest --cov=src/PromptShift --cov-report=html

# Format code
uv run black src/ tests/

# Lint code
uv run ruff check src/ tests/

# Type check
uv run mypy src/

# Run all checks (same as CI)
uv run pre-commit run --all-files

Running Integration Tests

Integration tests validate error handling and API behavior with real Groq API calls. These tests are optional and require a valid API key.

Setup:

Get a Groq API key from https://console.groq.com
Set the environment variable:
```
export GROQ_API_KEY=your_api_key_here
```

Run integration tests:

# Run all integration tests
GROQ_API_KEY=your_key uv run pytest tests/integration/ -m integration -v

# Run specific integration test file
GROQ_API_KEY=your_key uv run pytest tests/integration/test_error_scenarios.py -m integration -v

# Run specific test function
GROQ_API_KEY=your_key uv run pytest tests/integration/test_error_scenarios.py::test_invalid_api_key_raises_authentication_error -m integration -v

Skip integration tests:

# Run only unit tests, skip integration tests
uv run pytest -m "not integration"

Run github actions locally

act -j test -W .github/workflows/test.yml

Run brew install act if act is not installed

Important Notes:

Integration tests make real API calls and consume API quota
Tests will be skipped automatically if GROQ_API_KEY is not set
Rate limits may affect test execution if run frequently
CI runs integration tests automatically if the API key secret is configured (non-blocking)

Documentation

Full documentation is available at docs/index.md.

Build and view documentation locally:

# Install documentation dependencies
uv pip install -e ".[docs]"

# Build documentation
mkdocs build

# Serve locally
mkdocs serve
# Open http://127.0.0.1:8000 in your browser

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Roadmap

Epic 1: Foundation & Core Client API
Epic 2: Validation, Error Handling & Retry Logic
Epic 3: Determinism & Caching
Epic 4: Documentation, Examples & Polish

Support

📫 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
src/promptshift		src/promptshift
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

License

aritroCoder/promptshift

Folders and files

Latest commit

History

Repository files navigation

PromptShift

Features

Installation

Quick Start

Logging

Default Logging Behavior

Configuring Logging Levels

Library-Specific Logger

Log Levels Used

Example: Production Logging Setup

Example: Development/Debug Logging Setup

Deterministic Generation

Why Determinism Matters

How Determinism Works

Basic Deterministic Usage

Explicit Seed for Testing

Determinism Guarantees and Limitations

API Documentation

Performance

Performance Characteristics

Performance Example

Bypassing Cache for Non-Deterministic Scenarios

Performance Benchmarks

Development Setup

Prerequisites

Installation

Development Commands

Running Integration Tests

Documentation

Contributing

License

Roadmap

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages