A modern, production-ready Python package for intelligent PDF summarization with AI-powered text processing. Designed to condense large PDF documents while preserving critical information like code blocks and equations.
- Smart Text Processing: Enhanced word separation and text cleaning algorithms
- AI-Powered Summarization: Uses Facebook BART model for high-quality summaries
- Professional CLI: Easy-to-use command-line interface with
max-agentcommand - Flexible Configuration: YAML-based settings for complete customization
- Quality Output: Generates clean, formatted PDF summaries
- Modern Packaging: Proper Python package with pip installation support
- Fully Tested: Comprehensive test suite with type hints throughout
# Clone the repository
git clone https://github.com/Tennisee-data/Max_Agent.git
cd Max_Agent
# Install the package
pip install -e .
# Run with your PDFs
max-agent --help
max-agent --log-level INFO- ChatGPT Agent Optimization: Condense support documents to maximize token efficiency
- Document Analysis: Quickly extract key information from large PDFs
- Research Workflows: Summarize academic papers and technical documents
- Content Preparation: Process documents for AI agent configurations
src/max_agent/ # Main package
├── __init__.py # Package initialization
├── cli.py # Command-line interface
├── config.py # Configuration management
├── pdf_processor.py # Enhanced PDF processing
├── summariser.py # AI summarization engine
├── pdf_generator.py # PDF output generation
├── file_tracker.py # Processing state management
└── config.yaml # Default configuration
tests/ # Test suite
├── test_pdf_processor.py # Core functionality tests
└── ...
pyproject.toml # Modern Python packaging
requirements.txt # Dependencies
README.md # This file
Customize processing via config.yaml:
# Input/Output directories
input_directory: "./pdfs/"
output_directory: "./summarised_pdfs/"
cleaned_text_directory: "./cleaned_texts/"
# AI Model settings
summarization_model_id: "facebook/bart-large-cnn"
max_tokens: 1024
# Summarization ratios
first_min_ratio: 0.25
first_max_ratio: 0.45
second_min_ratio: 0.60
second_max_ratio: 0.80# Basic usage
max-agent
# Custom configuration
max-agent --config my_config.yaml
# Debug mode
max-agent --log-level DEBUG
# Check version
max-agent --versionOriginally designed to overcome ChatGPT's 20 PDF / 2M token limitations by intelligently condensing source documents:
- Token Efficiency: Reduce document size while preserving key information
- Smart Processing: Handles complex document structures
- Quality Preservation: Maintains code blocks, equations, and critical details
- Optimal Format: Works best with single-column text PDFs (OpenAI recommended)
- Text Extraction: Advanced PDF processing with
pdfplumber - AI Model: Facebook BART-large-CNN for summarization
- Word Segmentation: Intelligent text separation with
wordsegment - Output Format: Professional PDF generation with DejaVu fonts
- Code Quality: Black formatted, type-hinted, fully tested
- Python 3.8+
- Dependencies automatically installed via pip
- Compatible with CPU and GPU processing
# Install development dependencies
pip install -e .
# Run tests
python -m pytest tests/
# Code formatting
black src/max_agent/
# Linting
flake8 src/max_agent/- PDF Ingestion: Extracts text while preserving document structure
- Text Cleaning: Advanced word separation and content cleaning
- Smart Chunking: Divides content into optimal token-sized segments
- AI Summarization: Uses BART model for intelligent condensation
- Content Preservation: Maintains code blocks and equations separately
- PDF Generation: Creates professional formatted summary documents
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
François REEVES - francois@reevesnco.com
- Built with modern Python packaging standards
- Uses Facebook's BART model for summarization
- Powered by the transformers library
- Enhanced text processing with wordsegment
Star this repo if Max Agent helped you optimize your ChatGPT agents!