Skip to content

[FEATURE] Agent State Serialization #21

@afarntrog

Description

@afarntrog

Agent State Serialization Enhancement Plan

Problem Statement

Currently, Strands agents require agent.state to be JSON-serializable, limiting it to basic Python types (str, int, dict, list, etc.). This creates two distinct problems:

Problem 1: Rich Type Serialization

Users cannot store:

  • Python dataclasses
  • Pydantic models
  • Custom objects with complex behavior
  • Rich types like datetime, Decimal, UUID

This limitation forces users to write boilerplate serialization/deserialization code.

Problem 2: Runtime-Only Resources

Even with serialization improvements, some objects fundamentally cannot be persisted:

  • Database connections (sqlite3.Connection, psycopg2.connection)
  • API clients (boto3.client, httpx.Client)

These represent runtime resources, not data. They cannot be meaningfully serialized because they contain system handles that are only valid in the current process and recreating them from serialized state would not restore the actual resource

This prevents clean, Pythonic patterns that competing frameworks like Pydantic AI support.

Motivation

Pydantic AI and similar frameworks allow arbitrary Python objects in agent state, enabling cleaner code:

from dataclasses import dataclass
from datetime import datetime
from uuid import UUID

@dataclass
class AgentState:
    db_connection: DatabaseConnection  # Complex object
    created_at: datetime               # Rich type
    user_id: UUID                      # Rich type
    cache: dict[str, Any]              # Still works

agent = Agent('openai:gpt-4', deps_type=AgentState)

Benefits

  • No serialization boilerplate - Use Python objects directly without conversion code
  • Richer type system - Leverage dataclasses, Pydantic models, enums, datetime, Decimal, etc.
  • Complex state - Store database connections, API clients, file handles (runtime-only)
  • Better type safety - Full IDE support and type checking for state objects
  • Cleaner code - Business logic without serialization/deserialization

Design Decisions

1. Pluggable Serializers

We will introduce a pluggable serializer architecture that allows users to choose their serialization strategy:

  • JSONSerializer (default) - Backward compatible, human-readable, validates on set()
  • PickleSerializer - Supports any Python object, no validation on set()
  • Custom serializers - Users can implement the StateSerializer protocol

2. Configuration Location

After considering multiple options, we decided:

  • Agent construction: Accept state_serializer parameter for convenience
  • AgentState object: All further access and modification via agent.state.serializer
  • No property delegation: Agent won't have getter/setter for serializer
  • Conflict handling: Raise error if both state: AgentState and state_serializer provided
# At construction
agent = Agent(model, state_serializer=PickleSerializer())

# Later modification
agent.state.serializer = JSONSerializer()

3. Validation Strategy

Validation is delegated to the serializer:

  • JSONSerializer validates on set() to maintain current behavior
  • PickleSerializer has no validation (accepts anything)
  • Custom serializers can implement their own validation logic

4. Backward Compatibility

  • Default serializer is JSONSerializer → existing code works unchanged
  • JSONSerializableDict will be deprecated but kept for compatibility
  • Current JSON validation behavior preserved for default case

5. Runtime-Only State (Transient Values)

To handle runtime resources that cannot be persisted (database connections, API clients, etc.), we will add a persist parameter to the set() method:

  • Default behavior: persist=True - values are validated and serialized
  • Runtime-only: persist=False - values are kept in memory but excluded from serialization
  • Unified retrieval: Single get() method works for both persistent and transient values
  • Explicit opt-out: Runtime-only must be explicitly marked at call site

This solves the Runtime-Only Resources problem while maintaining backward compatibility.

Implementation Architecture

Core Components

# src/strands/state/serializers.py

from typing import Protocol, Any, runtime_checkable

@runtime_checkable
class StateSerializer(Protocol):
    """Protocol for state serializers."""
    
    def serialize(self, data: dict[str, Any]) -> bytes:
        """Serialize state dict to bytes."""
        ...
    
    def deserialize(self, data: bytes) -> dict[str, Any]:
        """Deserialize bytes back to state dict."""
        ...
    
    # Optional validation method
    def validate(self, value: Any) -> None:
        """Validate a value can be serialized (optional)."""
        ...

AgentState Refactoring

# src/strands/agent/state.py

class AgentState:
    """Flexible state container with pluggable serialization and runtime-only support."""
    
    def __init__(
        self, 
        initial_state: dict[str, Any] | None = None,
        serializer: StateSerializer | None = None
    ):
        self._data: dict[str, Any] = initial_state or {}
        self._transient_keys: set[str] = set()  # Track runtime-only keys
        self.serializer = serializer or JSONSerializer()
    
    def set(self, key: str, value: Any, *, persist: bool = True) -> None:
        """Set value with optional persistence.
        
        Args:
            key: The key to store the value under
            value: The value to store
            persist: If False, value is transient (not serialized). Default True.
        """
        self._validate_key(key)
        
        if persist:
            # Validate serializable (existing behavior)
            if hasattr(self.serializer, 'validate'):
                self.serializer.validate(value)
            self._transient_keys.discard(key)
        else:
            # Mark as transient - skip validation
            self._transient_keys.add(key)
        
        self._data[key] = value
    
    def get(self, key: str | None = None) -> Any:
        """Get value - works uniformly for persistent and transient."""
        if key is None:
            return copy.deepcopy(self._data)
        return copy.deepcopy(self._data.get(key))
    
    def is_transient(self, key: str) -> bool:
        """Check if a key is transient (not persisted)."""
        return key in self._transient_keys
    
    def serialize(self) -> bytes:
        """Serialize only persistent keys."""
        persistent_data = {k: v for k, v in self._data.items() 
                          if k not in self._transient_keys}
        return self.serializer.serialize(persistent_data)
    
    def deserialize(self, data: bytes) -> None:
        """Deserialize persistent state. Transient keys are preserved if in memory."""
        persistent_data = self.serializer.deserialize(data)
        # Keep transient keys in memory, replace persistent
        transient_data = {k: v for k, v in self._data.items() 
                         if k in self._transient_keys}
        self._data = {**persistent_data, **transient_data}

Session Manager Updates

Session managers will use the agent's serializer for persistence:

# In FileSessionManager.sync_agent()
serialized_state = agent.state.serialize()  # Returns bytes
# Store serialized_state appropriately

Implementation Phases

Phase 1: Core Serialization Infrastructure

  1. Create serializer protocol and implementations
  2. Refactor AgentState to use pluggable serializers
  3. Update Agent constructor

Phase 2: Session Manager Integration

  1. Update session types to handle serialized state
  2. Modify FileSessionManager to use agent serialization
  3. Modify S3SessionManager similarly
  4. Update RepositorySessionManager

Phase 3: Exports and Types

  1. Update public API exports
  2. Add necessary type definitions

Phase 4: Testing

  1. Unit tests for serializers
  2. Unit tests for AgentState with different serializers
  3. Unit tests for Agent serializer configuration
  4. Integration tests with rich types

Phase 5: Cleanup and Documentation

  1. Deprecate or remove JSONSerializableDict
  2. Update BidiAgent for consistency

Usage Examples

Basic Usage (Backward Compatible)

# Default behavior - JSON serialization with validation
agent = Agent(model)
agent.state.set("count", 42)  # ✅ Works
agent.state.set("created", datetime.now())  # ❌ ValueError: not JSON serializable

Rich Types with Pickle

from strands import Agent, PickleSerializer
from datetime import datetime
from uuid import UUID

agent = Agent(model, state_serializer=PickleSerializer())

# Store rich Python types
agent.state.set("created_at", datetime.now())  # ✅
agent.state.set("user_id", UUID('...'))        # ✅
agent.state.set("config", MyConfigClass())     # ✅

# Session persistence just works
session_manager.sync_agent(agent)  # Automatically uses pickle

Custom State Class

@dataclass
class CustomerState:
    customer_id: UUID
    last_interaction: datetime
    preferences: dict[str, Any]
    
agent = Agent(
    model,
    state={"customer": CustomerState(...)},
    state_serializer=PickleSerializer()
)

Runtime-Only State (Transient Values)

from strands import Agent, PickleSerializer
import sqlite3
import boto3

agent = Agent(model="...", state_serializer=PickleSerializer())

# Persistent state (default behavior)
agent.state.set("user_id", "12345")                    # ✅ Persisted
agent.state.set("session_data", {"cart": []})          # ✅ Persisted
agent.state.set("created_at", datetime.now())          # ✅ Persisted (with Pickle)

# Runtime-only state (explicit opt-out)
agent.state.set("db", sqlite3.connect(":memory:"), persist=False)  # ✅ Works
agent.state.set("s3_client", boto3.client("s3"), persist=False)    # ✅ Works
agent.state.set("temp_cache", {}, persist=False)                   # ✅ Works

# Unified retrieval - no need to know if transient
db = agent.state.get("db")           # Get runtime resource
user_id = agent.state.get("user_id") # Get persistent data

# Check if transient
agent.state.is_transient("db")       # True
agent.state.is_transient("user_id")  # False

# Use in tools
@tool
def query_database(query: str) -> str:
    db = tool_context.agent.state.get("db")
    if db is None:
        raise ValueError("Database not initialized")
    return str(db.execute(query).fetchall())

# Serialization behavior
checkpoint = agent.state.serialize()  # Only includes user_id, session_data, created_at
# After restore, transient values are lost
agent.state.deserialize(checkpoint)
agent.state.get("db")        # None (not persisted)
agent.state.get("user_id")   # "12345" (persisted)

Checkpointing Without Session Manager

# Serialize for checkpointing
checkpoint = agent.state.serialize()
save_to_file(checkpoint)

# Later restore
checkpoint = load_from_file()
agent.state.deserialize(checkpoint)

Security Considerations

Pickle Security Risks

Pickle can execute arbitrary code during deserialization. This is a known security risk. Users should:

  1. Only unpickle data from trusted sources
  2. Never unpickle data received over network from untrusted sources
  3. Consider using hmac to verify data integrity
  4. Use custom serializers with restricted functionality if needed

We will document these risks clearly in the API documentation.

Migration Guide

For Existing Users

No changes required. Existing code continues to work:

# This still works exactly as before
agent = Agent(model)
agent.state.set("key", "value")

Upgrading to Rich Types

# Old way - manual serialization
agent.state.set("created", datetime.now().isoformat())  # Convert to string
created = datetime.fromisoformat(agent.state.get("created"))  # Parse back

# New way - direct storage
agent = Agent(model, state_serializer=PickleSerializer())
agent.state.set("created", datetime.now())  # Store directly
created = agent.state.get("created")  # Already a datetime

Testing Strategy

Unit Tests

  • Test each serializer with valid/invalid inputs
  • Test AgentState with different serializers
  • Test validation delegation
  • Test serialize/deserialize round-trips
  • Test Agent constructor parameter handling

Integration Tests

  • Session persistence with PickleSerializer
  • Rich type round-trips (datetime, UUID, Decimal, dataclass, Pydantic)
  • Multi-agent scenarios with different serializers
  • Backward compatibility verification

Performance Tests

  • Compare serialization speed (JSON vs Pickle)
  • Memory usage with large state objects
  • Session sync performance

Documentation Updates

  1. Update Agent API docs with state_serializer parameter
  2. Add serialization guide to documentation
  3. Document security considerations for Pickle
  4. Provide migration examples
  5. Update tutorials with rich type examples

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions