perf: optimize Linker to avoid redundant AGENTS.md compression#171
perf: optimize Linker to avoid redundant AGENTS.md compression#171yacosta738 wants to merge 3 commits intomainfrom
Conversation
Implemented a two-tier caching strategy in the `Linker` to eliminate redundant work during synchronization: 1. `compression_cache`: Caches the result of Markdown compression per source file, avoiding redundant re-reading and re-compressing of the same AGENTS.md for different agents. 2. `ensured_outputs`: Tracks which intermediate compressed files have already been verified/written in the current run, avoiding redundant disk I/O and comparisons. These optimizations significantly reduce CPU and I/O overhead for projects with many AI agents sharing the same configuration instructions. Co-authored-by: yacosta738 <33158051+yacosta738@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
📝 WalkthroughSummary by CodeRabbit
WalkthroughLinker gained an internal compression cache and an ensured-outputs set; write_compressed_agents_md now reuses cached compressed AGENTS.md (Rc), compares destination content to avoid writes, ensures destination dirs, and marks destinations as ensured when verified or written. Changes
Sequence Diagram(s)sequenceDiagram
participant Linker as Linker
participant Compressor as compress_agents_md
participant Cache as CompressionCache
participant FS as FileSystem
Linker->>Cache: check compression_cache for `src_path`
alt cache hit
Cache-->>Linker: Rc<String> compressed_content
else cache miss
Linker->>Compressor: compute compressed_content
Compressor-->>Linker: String
Linker->>Cache: store Rc<String> for `src_path`
end
Linker->>FS: read `dest_path` content (if exists)
FS-->>Linker: existing_content
Linker->>Linker: compare existing_content vs compressed_content
alt identical
Linker->>Linker: add `dest_path` to ensured_outputs
else different
Linker->>FS: ensure dest dir exists
Linker->>FS: write compressed_content (by reference)
FS-->>Linker: write result
Linker->>Linker: add `dest_path` to ensured_outputs
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/linker.rs`:
- Around line 258-269: The cache currently stores owned Strings and does a full
String::clone on cache hits in the block that reads AGENTS.md; change the
compression_cache field to store Rc<String> (or Rc<str>) instead of String,
update the insertion to wrap the compressed result in Rc::new(...) and store
that, and replace cached.clone() with a cheap Rc clone (i.e., cached.clone()
still but now clones the Rc pointer) so consumers share one heap allocation;
finally update call sites that expect &str (e.g., where compressed is used
later) to use &*compressed or compressed.as_str() and keep using
compress_agents_md_content and the same cache lookup/insert logic but storing
Rc-wrapped values.
✅ Contributor ReportUser: @yacosta738
Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-02-13 to 2026-02-13 |
Further optimized the `Linker` by using `Rc<String>` for the `compression_cache`. This allows multiple agents to share the same heap-allocated compressed content without deep string clones, reducing both memory usage and CPU time during synchronization. Also ensured that redundant disk existence and content checks are avoided via the `ensured_outputs` cache. Co-authored-by: yacosta738 <33158051+yacosta738@users.noreply.github.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/linker.rs (1)
252-295: 🧹 Nitpick | 🔵 TrivialCaching logic is correct and well-structured.
The
Rc::cloneon Line 263 is O(1) as intended, theensured_outputsearly-return avoids redundant I/O, and the*compresseddereferences are correct for both comparison (Line 276) and write (Line 289).One observation: there's no dedicated test exercising the multi-agent scenario (N agents referencing the same
AGENTS.mdwith compression enabled) to verify that compression and disk writes happen only once. The existing functional tests ensure correctness but don't assert the caching behavior that this PR is specifically optimizing.Consider adding a test like:
#[test] #[cfg(unix)] fn test_compression_cache_avoids_redundant_writes() { let temp_dir = TempDir::new().unwrap(); let agents_dir = temp_dir.path().join(".agents"); fs::create_dir_all(&agents_dir).unwrap(); let source_file = agents_dir.join("AGENTS.md"); fs::write(&source_file, "## Title\n\nSome spaced text.\n").unwrap(); let config_path = agents_dir.join("agentsync.toml"); let config_content = r#" source_dir = "." compress_agents_md = true [agents.claude] enabled = true [agents.claude.targets.main] source = "AGENTS.md" destination = "CLAUDE.md" type = "symlink" [agents.copilot] enabled = true [agents.copilot.targets.main] source = "AGENTS.md" destination = "COPILOT.md" type = "symlink" "#; fs::write(&config_path, config_content).unwrap(); let config = Config::load(&config_path).unwrap(); let linker = Linker::new(config, config_path); let result = linker.sync(&SyncOptions::default()).unwrap(); assert_eq!(result.created, 2); // Both symlinks should resolve to the same compressed file let compressed = agents_dir.join("AGENTS.compact.md"); assert!(compressed.exists()); // Verify caches were populated assert_eq!(linker.compression_cache.borrow().len(), 1); assert!(linker.ensured_outputs.borrow().contains(&compressed)); }This would directly verify the optimization is working as intended.
Further optimized the `Linker` by using `Rc<String>` for the `compression_cache`. This allows multiple agents to share the same heap-allocated compressed content without deep string clones, reducing both memory usage and CPU time during synchronization. Also ensured that redundant disk existence and content checks are avoided via the `ensured_outputs` cache. Fixed formatting issues in `src/linker.rs`. Co-authored-by: yacosta738 <33158051+yacosta738@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/linker.rs (1)
987-1036: 🛠️ Refactor suggestion | 🟠 MajorMissing test for the core optimization: multiple agents sharing the same compressed
AGENTS.mdThe PR's stated goal is eliminating O(N) redundant compression when multiple agents reference the same
AGENTS.mdwithcompress_agents_md = true. However, every compression-related test (test_sync_compresses_agents_md_when_enabled,test_sync_symlink_contents_compresses_agents_md) uses a single agent. The cache-hit path (Lines 262–264) and theensured_outputsshort-circuit (Line 254) are never exercised by the test suite, leaving the optimization's correctness unverified.✅ Proposed test for multi-agent sharing scenario
#[test] #[cfg(unix)] fn test_sync_multiple_agents_share_compressed_agents_md() { let temp_dir = TempDir::new().unwrap(); let agents_dir = temp_dir.path().join(".agents"); fs::create_dir_all(&agents_dir).unwrap(); fs::write( agents_dir.join("AGENTS.md"), "## Title \n\nSome text\twith spacing.\n", ) .unwrap(); let config_path = agents_dir.join("agentsync.toml"); let config_content = r#" source_dir = "." compress_agents_md = true [agents.claude] enabled = true [agents.claude.targets.main] source = "AGENTS.md" destination = "CLAUDE.md" type = "symlink" [agents.copilot] enabled = true [agents.copilot.targets.main] source = "AGENTS.md" destination = "COPILOT.md" type = "symlink" "#; fs::write(&config_path, config_content).unwrap(); let config = Config::load(&config_path).unwrap(); let linker = Linker::new(config, config_path); let result = linker.sync(&SyncOptions::default()).unwrap(); // Both agents must be created assert_eq!(result.created, 2); // The single compressed file must exist let compressed = agents_dir.join("AGENTS.compact.md"); assert!(compressed.exists()); // Both symlinks must resolve to that same compressed file for dest_name in &["CLAUDE.md", "COPILOT.md"] { let dest = temp_dir.path().join(dest_name); assert!(dest.is_symlink(), "{dest_name} should be a symlink"); let link_target = fs::read_link(&dest).unwrap(); let linked_canon = fs::canonicalize(dest.parent().unwrap().join(link_target)).unwrap(); assert_eq!( linked_canon, fs::canonicalize(&compressed).unwrap(), "{dest_name} should link to the shared compressed file" ); } // Compressed content should be normalized (inline spaces collapsed) let content = fs::read_to_string(&compressed).unwrap(); assert!(content.contains("Some text with spacing.")); }Would you like me to open a new issue to track adding this test?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/linker.rs` around lines 987 - 1036, The test suite misses covering the multi-agent cache-hit path for compressed AGENTS.md; add a new unit test (e.g., test_sync_multiple_agents_share_compressed_agents_md) that creates two agents in the same config both pointing to the same AGENTS.md with compress_agents_md = true, then run Linker::new(...).sync(&SyncOptions::default()) and assert result.created == 2, that a single AGENTS.compact.md exists, and that both generated symlink targets (read via fs::read_link on each destination) canonicalize to that same compressed file to exercise the cache-hit and ensured_outputs short-circuit paths in Linker.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/linker.rs`:
- Around line 252-256: The caches ensured_outputs and compression_cache on the
Linker persist across multiple sync() calls causing write_compressed_agents_md
and compression lookups to return stale results; to fix, at the start of
Linker::sync() clear both caches (e.g., borrow_mut().clear() on the
ensured_outputs and compression_cache) so each sync run recomputes outputs, and
keep references to these fields (ensured_outputs, compression_cache) and
write_compressed_agents_md in mind when making the change; alternatively, if you
choose not to clear them, add clear documentation on the Linker struct and the
sync() method stating the single-run lifetime contract.
- Around line 56-60: Add a short explanatory comment above the compression_cache
field documenting that using Rc inside RefCell is intentional to keep Linker
single-threaded and avoid Arc's atomic overhead; mention that Rc<String> makes
Linker non-Send and that this is acceptable because the CLI is single-threaded
and Linker is never moved across threads in practice, and note that using Arc
would add unnecessary atomic cost — apply this comment to the compression_cache:
RefCell<HashMap<PathBuf, Rc<String>>> declaration near ensured_outputs to make
the intent explicit.
---
Outside diff comments:
In `@src/linker.rs`:
- Around line 987-1036: The test suite misses covering the multi-agent cache-hit
path for compressed AGENTS.md; add a new unit test (e.g.,
test_sync_multiple_agents_share_compressed_agents_md) that creates two agents in
the same config both pointing to the same AGENTS.md with compress_agents_md =
true, then run Linker::new(...).sync(&SyncOptions::default()) and assert
result.created == 2, that a single AGENTS.compact.md exists, and that both
generated symlink targets (read via fs::read_link on each destination)
canonicalize to that same compressed file to exercise the cache-hit and
ensured_outputs short-circuit paths in Linker.
| /// Cache of compressed content for AGENTS.md files to avoid redundant processing. | ||
| /// Uses Rc to share the same heap-allocated string across multiple agents. | ||
| compression_cache: RefCell<HashMap<PathBuf, Rc<String>>>, | ||
| /// Tracks which compressed files have already been ensured to be up-to-date. | ||
| ensured_outputs: RefCell<HashSet<PathBuf>>, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
🧩 Analysis chain
🏁 Script executed:
# First, let's check if Linker has explicit Send/Sync bounds or implementations
rg -n --type=rust 'impl.*Send.*Linker|impl.*Sync.*Linker|Linker.*Send|Linker.*Sync' src/Repository: dallay/agentsync
Length of output: 210
🏁 Script executed:
# Check for usage patterns that would require Linker to be Send
rg -n --type=rust 'Arc<Linker>|thread::spawn|Linker.*crossbeam|Linker.*tokio|Linker.*async' src/ tests/Repository: dallay/agentsync
Length of output: 42
🏁 Script executed:
# Examine the Linker struct definition itself
rg -A 20 'pub struct Linker' src/linker.rs | head -60Repository: dallay/agentsync
Length of output: 878
🏁 Script executed:
# Check if there are any tests asserting Send/Sync trait bounds on Linker
rg -n --type=rust 'assert_.*Send|assert_.*Sync|fn assert_send|fn assert_sync' tests/ src/Repository: dallay/agentsync
Length of output: 42
🏁 Script executed:
# Check all fields of Linker struct to understand pre-PR Send status
rg -A 10 'pub struct Linker' src/linker.rs | head -30Repository: dallay/agentsync
Length of output: 564
🏁 Script executed:
# Check how Linker is used in main.rs - does it get moved between threads?
cat -n src/main.rs | head -80Repository: dallay/agentsync
Length of output: 2812
🏁 Script executed:
# Look for any documentation or comments about Send/Sync
rg -n --type=rust 'Send|Sync|thread|Arc<' src/linker.rs | head -20Repository: dallay/agentsync
Length of output: 1145
🏁 Script executed:
# Check if there are any public async functions that take Linker
rg -n --type=rust 'async fn.*Linker|pub.*async.*fn' src/linker.rs | head -20Repository: dallay/agentsync
Length of output: 42
🏁 Script executed:
# Check if Config is Send (to verify pre-PR Linker was indeed Send)
rg -n --type=rust 'pub struct Config' src/config.rs | head -5Repository: dallay/agentsync
Length of output: 82
🏁 Script executed:
# Check actual usage of Linker in main.rs - is it ever Arc'd or spawned?
cat -n src/main.rs | tail -150Repository: dallay/agentsync
Length of output: 6782
🏁 Script executed:
# Look for any library documentation or CHANGELOG mentioning Send/thread-safety
fd -e md -e toml -e txt | xargs grep -l -i 'send\|thread\|concurrent\|async' 2>/dev/null | head -10Repository: dallay/agentsync
Length of output: 209
🏁 Script executed:
# Check if Rc/Arc are used elsewhere in the codebase to understand project's threading posture
rg -n 'use std::rc::Rc|use std::sync::Arc' src/Repository: dallay/agentsync
Length of output: 93
🏁 Script executed:
# Final check: Does Config struct export Send guarantee or any trait bounds?
rg -A 5 'pub struct Config' src/config.rs | head -20Repository: dallay/agentsync
Length of output: 278
🏁 Script executed:
# Check if there's any documentation in lib.rs about Linker's thread safety
rg -B 5 -A 5 'pub use linker::Linker' src/lib.rsRepository: dallay/agentsync
Length of output: 42
🏁 Script executed:
# Verify: are there any comments in the PR or commit history suggesting intentional single-threaded design?
git log --oneline --all | grep -i 'rc\|cache\|compress' | head -10Repository: dallay/agentsync
Length of output: 766
🏁 Script executed:
# Check the commit message for the Rc introduction to understand intent
git show e3cbd79 --stat | head -30Repository: dallay/agentsync
Length of output: 1560
🏁 Script executed:
# Check if there's any public documentation or README mentioning thread-safety
cat README.md 2>/dev/null | head -50Repository: dallay/agentsync
Length of output: 2577
🏁 Script executed:
# Verify: does any test or code explicitly construct Linker in a way that would break if it's not Send?
rg -n --type=rust 'fn.*Linker.*\->' src/ tests/Repository: dallay/agentsync
Length of output: 42
Document the intentional single-threaded design of Linker
The compression_cache: RefCell<HashMap<PathBuf, Rc<String>>> field uses Rc, which does not implement Send, making Linker: !Send. While this is technically a narrowing of the struct's trait implementations, it reflects a deliberate performance choice documented in the commit message: to reduce memory usage and CPU time during compression by avoiding deep string clones across agents.
Since this is a single-threaded CLI tool with no async or threading patterns, and Linker is never moved across thread boundaries in practice, the practical impact is minimal. However, to prevent confusion for future library users, add a comment documenting the intentional single-threaded design:
/// Intentionally uses Rc for single-run compression caching.
/// Linker is not Send; designed for single-threaded synchronization operations.
compression_cache: RefCell<HashMap<PathBuf, Rc<String>>>,Using Arc would add unnecessary atomic overhead without benefit in this context.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/linker.rs` around lines 56 - 60, Add a short explanatory comment above
the compression_cache field documenting that using Rc inside RefCell is
intentional to keep Linker single-threaded and avoid Arc's atomic overhead;
mention that Rc<String> makes Linker non-Send and that this is acceptable
because the CLI is single-threaded and Linker is never moved across threads in
practice, and note that using Arc would add unnecessary atomic cost — apply this
comment to the compression_cache: RefCell<HashMap<PathBuf, Rc<String>>>
declaration near ensured_outputs to make the intent explicit.
| fn write_compressed_agents_md(&self, source: &Path, dest: &Path) -> Result<()> { | ||
| let content = fs::read_to_string(source) | ||
| .with_context(|| format!("Failed to read AGENTS.md: {}", source.display()))?; | ||
| let compressed = compress_agents_md_content(&content); | ||
| // 1. Skip if this specific destination has already been ensured to be up-to-date. | ||
| if self.ensured_outputs.borrow().contains(dest) { | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
Stale ensured_outputs / compression_cache if sync() is called more than once on the same Linker
Neither cache is cleared between sync() invocations. If a caller reuses a Linker instance after a source file changes (e.g., in a long-running process or a test harness), ensured_outputs will short-circuit write_compressed_agents_md (Line 254) and compression_cache will return the now-stale compressed content, silently leaving the destination file out of date.
The intended single-run lifetime should be either documented on the struct or enforced by clearing the caches at the top of sync():
🛡️ Option A — Clear caches at the start of each `sync()` call
pub fn sync(&self, options: &SyncOptions) -> Result<SyncResult> {
+ self.compression_cache.borrow_mut().clear();
+ self.ensured_outputs.borrow_mut().clear();
let mut result = SyncResult::default();📄 Option B — Document the single-run contract on the struct
+/// **Note:** The internal `compression_cache` and `ensured_outputs` caches are
+/// populated during a single `sync()` run and are never cleared. Reusing a
+/// `Linker` instance after source files change will produce stale results.
pub struct Linker {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/linker.rs` around lines 252 - 256, The caches ensured_outputs and
compression_cache on the Linker persist across multiple sync() calls causing
write_compressed_agents_md and compression lookups to return stale results; to
fix, at the start of Linker::sync() clear both caches (e.g.,
borrow_mut().clear() on the ensured_outputs and compression_cache) so each sync
run recomputes outputs, and keep references to these fields (ensured_outputs,
compression_cache) and write_compressed_agents_md in mind when making the
change; alternatively, if you choose not to clear them, add clear documentation
on the Linker struct and the sync() method stating the single-run lifetime
contract.
⚡ Bolt: optimize Linker to avoid redundant AGENTS.md compression
💡 What: Implemented a more robust caching strategy in the
Linkerto avoid redundant processing of the sameAGENTS.mdfile multiple times during a singleagentsync applyrun.🎯 Why: Previously, if multiple agents (e.g., Claude, Copilot, Gemini) all pointed to the same
AGENTS.mdsource with compression enabled, theLinkerwould re-read, re-compress, and re-verify the intermediate compact file for every single agent. This caused unnecessary CPU and I/O overhead, proportional to the number of agents.📊 Impact:
🔬 Measurement:
PR created automatically by Jules for task 18174439066230647256 started by @yacosta738