Open
Conversation
Critical fixes: - Add configurable retry_prompt parameter to Reflector class (English default) - Add configurable retry_prompt parameter to Curator class (English default) - Replace hardcoded Chinese retry prompts with configurable system - All three roles (Generator, Reflector, Curator) now consistent - Update .gitignore to exclude checkpoint and evaluation result JSON files This completes the refactoring started in commit 087e2ed where we fixed Generator's Chinese prompt. Now all three ACE roles use the same pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update 4 example files to use prompts_v2_1 instead of deprecated prompts_v2: - examples/helicone/convex_training.py - examples/advanced_prompts_v2.py - examples/helicone/offline_training_replay.py - examples/browser-use/ace_domain_checker.py Note: compare_v1_v2_prompts.py and compare_v2_v2_1_prompts.py intentionally keep prompts_v2 imports since they explicitly compare prompt versions. Examples now demonstrate current best practices. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive documentation for recent improvements: - Configurable retry_prompt parameter (Generator, Reflector, Curator) - Checkpoint saving during training (checkpoint_interval, checkpoint_dir) - Prompt version guidance (v1.0 simple, v2.0 deprecated, v2.1 recommended) - Feature detection utilities (ace/features.py) - Updated test coverage section (mention integration tests) Also update module structure to reflect: - prompts_v2.py marked as DEPRECATED - prompts_v2_1.py marked as RECOMMENDED - New features.py module CLAUDE.md now serves as complete developer reference. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix Python version requirement in SETUP_GUIDE.md (3.9 → 3.11) - Fix async test decorator in test_litellm_client.py - Export DataLoader from benchmarks/loaders for API consistency - Update examples to use recommended prompts_v2_1 instead of deprecated prompts_v2 - Remove unnecessary sys.path manipulation from all example files All 79 tests passing. Resolves critical documentation inconsistencies and improves code quality across examples and test suite. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Simplified README quickstart from 55 lines to ~35 lines - Added built-in SimpleEnvironment class for easy getting started - Removed need for custom environment class in quickstart - Made the quickstart more progressive: basic usage → learning - Added SimpleEnvironment to ace exports - Added links to full examples for users who want more The new quickstart is much more approachable for beginners while still showing the core value of ACE (learning from examples). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Updates to core ACE components: - Enhanced delta operations and playbook functionality - Improved prompts v2.1 with better role implementations - Updated browser automation examples for domain checking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
feat: Enhance browser-use demos and core ACE framework
- Added new demo section showcasing ACE vs baseline browser automation - Includes performance metrics: 30% → 100% success rate, 38.8 → 6.9 avg steps - Added demo results image with detailed comparison data - Shows ACE's autonomous learning and optimization capabilities 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Fixes JSON serialization error when Sample objects are passed via kwargs to LLM completion calls. The 'sample' parameter is used by ReplayGenerator but cannot be serialized when LiteLLM attempts to log it to Opik tracing. Changes: - Generator._generate_impl(): Filter 'sample' from kwargs before llm.complete() - Reflector._reflect_impl(): Filter 'sample' from kwargs before llm.complete() - Curator.curate(): Filter 'sample' from kwargs before llm.complete() This preserves ReplayGenerator functionality while preventing serialization errors when Opik observability is enabled. Based on LiteLLM best practices for handling custom metadata in kwargs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Allows manual triggering of tests via GitHub UI or CLI - Helps diagnose why automatic workflow triggers stopped after Oct 22 - Updates workflow registration with GitHub Actions
- Add proper type casts and annotations in delta.py - Fix missing Any import in adaptation.py - Add proper Optional type handling for playbook.py - Fix None return types in prompts_v2.py and prompts_v2_1.py - Fix optional dependency type annotations across all modules - Add TYPE_CHECKING guards for conditional imports - Fix decorator signature inconsistencies in roles.py - Resolve dict.get() type issues - Fix Router type assignments in litellm_client.py All 46 mypy errors have been addressed.
- Fix no-redef errors by declaring type annotations before assignments - Add missing List import in prompts_v2_1.py - Fix Dict[str, Any] type annotation for comparisons dict - Add proper cast for int() in playbook.py - Add Optional[Any] type annotation for router in litellm_client.py - Use type: ignore[assignment] for conditional type assignments All mypy errors should now be resolved.
- Check if OpikLogger is None before calling constructor - Add type: ignore[misc] for the instantiation - Ensures mypy passes with all optional dependency scenarios mypy now reports: Success: no issues found in 16 source files
Set up automated code quality checks with pre-commit: - Added pre-commit dependency to dev requirements - Created .pre-commit-config.yaml with Black (formatter) and Mypy (type checker) - Added Black and Mypy configuration to pyproject.toml - Formatted all Python files with Black (42 files reformatted) Pre-commit hooks now automatically: - Format code with Black on every commit - Type-check with Mypy (checking ace/ directory only) This ensures consistent code style and catches type errors before they reach CI/CD. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Rename common.py → shared.py with enhanced docs - Rename utils.py → debug.py for clarity - Create form-filler/form_utils.py for consistency - Update all imports across examples - Add template function documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add workflow diagram to README showing ACE data flow - Simplify folder structure documentation - Enhance TEMPLATE.py with better error handling and output capture - Fix method name in ace_form_filler.py (to_file → save_to_file) - Reduce test domains to 2 for faster testing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- baseline-grocery-price-comparison.py: Full 3-store comparison (Migros, Coop, Aldi) - test-baseline-grocery-price-comparison.py: Single-store test version (Migros only) Features: - Automated grocery shopping for 5 essential items across Swiss stores - Price comparison with basket totals and item details - Performance metrics tracking (steps, browser-use tokens) - Regex parsing for structured agent output - Console-only output following domain-checker demo pattern - Claude Anthropic 4.5 model integration 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
… raw logs passing
Changes: - Pass only raw browser-use logs to reflector (no analysis/metrics) - Clean up execution log collection (remove commentary) - Increase max_tokens to 8192 for all ACE roles (Generator, Reflector, Curator) - Fix AttributeError: bullet.helpful_count → bullet.helpful - Prevents JSON truncation errors with large browser automation logs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Clean up online shopping demo by removing obsolete example files. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove old migros-specific demo files - Add new consolidated ace-online-shopping.py and baseline-online-shopping.py demos - Include results screenshot showing performance comparison 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Replace complex 678-line implementation with simple 207-line loop: - Single high-level prompt (Claude works autonomously) - Stall detection via commit counting - Remove TODO.md parsing and validation phases - Delete run_experiment.py and analyze_runs.py (depended on old structure) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add prompt.md for easy task customization - Simplify README.md to ~50 lines - Remove outdated IMPLEMENTATION_NOTES.md and playbooks/ - Update ace_loop.py to load prompt from file 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Update example code: Playbook→Skillbook, Bullet→Skill 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove specs/ folder (TypeScript-specific) - Simplify prompt.md, README, and reset_workspace.sh - Update terminology: playbook→skillbook, bullets→skills - Reduce reset_workspace.sh from 235 to 103 lines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Enhanced README with results table, prompt tips, and example - Updated reset_workspace.sh to archive workspace and skillbook to logs - Simplified prompt.md template - Made workspace_template/.env.example all commented out - Removed workspace_template/README.md to avoid confusion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…egration Add Claude Code learning loop example
- Use .env.ace for ACE loop config (API key, model) - AUTO_MODE defaults to true (fully automatic) - Update README to use uv commands - Clarify workspace_template/.env.example purpose 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.