Skip to content

Conversation

@NTLx
Copy link
Contributor

@NTLx NTLx commented Jan 31, 2026

Summary

This PR implements Memory Compression and Deduplication Optimization for PowerMem as requested in issue #141.

Changes

Core Features

Deduplication (deduplicate())

  • Detect duplicate memories using semantic similarity (embedding-based)
  • Configurable similarity threshold (default: 0.95)
  • Intelligent merge: keep most important memory, combine metadata
  • Dry-run mode to preview changes without applying
  • Detailed report of operations performed

Compression (compress())

  • Consolidate similar memories using LLM summarization
  • Three compression strategies:
    • conservative: Keep original wording, remove only obvious duplicates
    • moderate: Combine similar points, remove redundancy
    • aggressive: Very concise summary, keep only essential information
  • Preserve key information while reducing redundancy
  • Maintain memory relationships and metadata

Optimization (optimize())

  • Convenience wrapper for deduplication and compression
  • Strategy options: deduplicate, compress, all
  • Unified interface for memory optimization

API Examples

# Deduplication
results = memory.deduplicate(
    user_id="user123",
    threshold=0.95,  # Similarity threshold
    dry_run=False,   # Apply changes
)
# Returns: {"duplicates_found": 10, "merged": 10, "deleted": 10, "saved_space": "5.00 KB"}

# Compression
results = memory.compress(
    user_id="user123",
    strategy="conservative",  # conservative, moderate, aggressive
    dry_run=False,
)
# Returns: {"memories_analyzed": 20, "compressed": 5, "original_count": 20, "compressed_count": 15}

# Combined optimization
results = memory.optimize(
    user_id="user123",
    strategy="all",
    threshold=0.95,
)

Technical Implementation

  • Embedding-based detection: Uses batch embedding for efficient similarity computation
  • Cosine similarity: Calculates similarity between memory embeddings
  • LLM-powered compression: Uses LLM to create intelligent summaries
  • Safety features:
    • Backup metadata before merging
    • Preserve important information
    • Detailed error handling
    • Progress logging

Testing

  • Added comprehensive unit tests in tests/unit/test_memory_optimization.py
  • Tests cover:
    • Empty memory handling
    • Duplicate detection
    • Different compression strategies
    • Dry-run mode
    • Embedding similarity calculation
    • Edge cases

Documentation

  • Clear docstrings for all new methods
  • Usage examples in code comments
  • Strategy parameter explanations

Related Issues


Closes #141

- Add deduplicate() method for finding and merging duplicate memories
- Add compress() method for consolidating similar memories using LLM
- Add optimize() method as convenience wrapper
- Support configurable similarity threshold for deduplication
- Support strategies: conservative, moderate, aggressive
- Support dry-run mode to preview changes
- Add comprehensive unit tests
- Implement embedding-based similarity detection
- Add batch embedding support for efficiency

Related to oceanbase/seekdb#123
Closes oceanbase#141
@Teingi
Copy link
Member

Teingi commented Jan 31, 2026

The test pipeline has failed.

- Fix invalid syntax: compressed_count = len(group) - 1 for group...
- Update test mocks to properly mock _find_duplicate_groups method
- Ensure all unit tests pass with correct mock configurations

Closes oceanbase#215
@NTLx
Copy link
Contributor Author

NTLx commented Jan 31, 2026

The test failures have been fixed:\n\nIssues resolved:\n1. Fixed syntax error in line 1949 ()\n2. Fixed test mocks to properly mock method\n3. All 18 memory optimization tests now pass ✅\n4. All 142 unit tests pass ✅\n5. All 7 list memories sorting tests pass ✅\n\nTests have been re-run and are passing. Please re-trigger the CI pipeline to verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Memory Compression and Deduplication Optimization

2 participants