Text manipulation, formatting, and processing utilities for AI agents.
All planned text processing modules have been implemented and tested:
- ✅ Text cleaning and whitespace normalization
- ✅ String case conversion (snake_case, camelCase, Title Case)
- ✅ HTML tag stripping and text extraction
- ✅ Unicode normalization and encoding handling
- ✅ Line ending normalization across platforms
- ✅ Smart text splitting with word preservation
- ✅ Sentence extraction and text parsing
- ✅ Advanced text joining with proper formatting
- Unicode-aware text processing
- Functions designed as individual agent tools
- Memory-efficient string operations
- Clear error messages and handling
- Platform-independent line ending handling
- Consistent API design across modules
- Functions suitable for agent framework integration
- Clear function signatures optimized for AI tool usage
- Safe text processing without code execution risks
- Configurable behavior for different use cases
- Focus on text transformation, not analysis (NLP belongs in separate module)
- Exclude regex-heavy operations (provide simple, predictable transformations)
- Support for common text formatting needs
- Preserve text integrity during transformations
- Handle edge cases gracefully
- Natural Language Processing - Sentiment analysis, entity extraction, language detection (requires ML models)
- Text Analytics - Word frequency, readability scores, statistical analysis (separate analytics module)
- Template Engines - Complex templating systems (separate templating module)
- Markdown/Rich Text Processing - Complex markup parsing and rendering (separate markup module)
clean_whitespace(text: str) -> str- Remove extra spaces, tabs, and normalize whitespacenormalize_line_endings(text: str, style: str) -> str- Standardize line endings (unix/windows/mac)strip_html_tags(text: str) -> str- Remove HTML tags and extract plain textnormalize_unicode(text: str, form: str) -> str- Normalize Unicode text (NFC, NFD, NFKC, NFKD)
to_snake_case(text: str) -> str- Convert to snake_case formatto_camel_case(text: str, upper_first: bool) -> str- Convert to camelCase or PascalCaseto_title_case(text: str) -> str- Convert to Title Case with smart capitalization
smart_split_lines(text: str, max_length: int, preserve_words: bool) -> List[str]- Split text into lines with word preservationextract_sentences(text: str) -> List[str]- Extract sentences with proper boundary detectionjoin_with_oxford_comma(items: List[str], conjunction: str) -> str- Join list items with proper grammar
- No code execution or eval operations
- Safe HTML tag removal without parsing vulnerabilities
- Input validation for all transformation functions
- Memory limits for large text processing
- Protection against algorithmic complexity attacks
- Efficient string operations using built-in methods
- Memory-conscious processing for large texts
- Optimized regex patterns where used
- Minimal copying for transformation operations
Compatible with:
- Google ADK: Direct function imports as tools
- LangChain: Wrappable with StructuredTool
- Strands Agents: Native @strands_tool decorator support
- Custom Agents: Simple function-based API
import basic_open_agent_tools as boat
# Load text processing tools
text_tools = boat.load_all_text_tools()
# Individual function usage
from basic_open_agent_tools.text import clean_whitespace, to_snake_case
# Clean up messy text
cleaned_text = clean_whitespace(" Hello world \\n\\n ")
# Convert to snake_case for programming
variable_name = to_snake_case("User Full Name") # "user_full_name"- processing.py - Core text processing and transformation functions (10 functions)
- Data Cleaning: Normalize whitespace and line endings in imported text
- Code Generation: Convert natural language to programming identifiers
- Content Processing: Clean HTML content and extract readable text
- Text Formatting: Prepare text for display with proper capitalization
- Data Import: Standardize text data from various sources
- Template Processing: Format lists and text for human-readable output
Many functions can be chained together for comprehensive text processing:
normalize_unicode()- Standardize character encodingstrip_html_tags()- Remove markup if presentclean_whitespace()- Normalize spacingnormalize_line_endings()- Standardize line breaks- Apply case conversion as needed
Total Functions: 10 agent-ready tools with Google ADK compatibility