feat(init/docs): enforce rgai-first search policy and reduce token cost#118
feat(init/docs): enforce rgai-first search policy and reduce token cost#118heAdz0r wants to merge 2 commits intortk-ai:masterfrom
Conversation
Reviewer Checklist (Enterprise QA)For faster and consistent validation, this checklist covers correctness, compatibility, and operational risk.
If requested, I can add follow-up benchmarks on additional repos/workloads for external validity. |
|
Benchmark table cleanup update (requested):
Fairness note:
Current top-line results:
|
|
Security workflow note for maintainers:
Could a maintainer please trigger/approve the workflow run for this fork PR so Layer 1 results are recorded in Actions summary? Layer 1 local preflight was run and indicates:
|
FlorianBruniaux
left a comment
There was a problem hiding this comment.
Review: request changes
Blocker: rtk rgai command doesn't exist
The entire PR documents, hook-rewrites, and tests a command (rtk rgai) that is not implemented in the codebase. There is no rgai match in src/main.rs, no rgai_cmd.rs module — nothing. Every grepai search ... or rgai ... that gets rewritten by the hook to rtk rgai ... will crash with "unrecognized subcommand".
This is the main blocker. The search policy documentation should land together with (or after) the actual command implementation, not before.
Blocker: benchmark data from private source
The README now includes a benchmark table citing "internal migration benchmark artifacts (private repository)". This is not verifiable by any external contributor or user. Either:
- Make the benchmark reproducible (publish the dataset/script), or
- Remove the table from the public README and keep it in internal docs
Putting unverifiable claims in a project README weakens trust.
Scope: hook changes for find/tree/wget are unrelated
The test file changes find, tree, and wget from "NOT rewritten" to rewritten (rtk find, rtk tree, rtk wget). This is a behavior change unrelated to the rgai search policy. Should be in a separate PR.
Good parts
upsert_rtk_block()is well designed: handles all 4 cases (add/update/unchanged/malformed), good test coverage, safe with existing content. This refactor alone is mergeable.- Hook rewrite ordering is correct: most specific patterns (
grepai search) match first, thenrgai, thenrg, thengrep. Deterministic and sound. - Test infrastructure improvements (HOOK override via env var, script-relative path) are solid.
Suggested split
- PR A:
upsert_rtk_blockrefactor ininit.rs— mergeable now, pure improvement - PR B:
rtk rgaicommand implementation — the actual feature - PR C: Search policy docs + hooks + templates — lands with or after PR B
- PR D: Hook find/tree/wget rewrites — separate scope
| rtk ls <path> # Tree format, compact (65%) | ||
| rtk read <file> # Code reading with filtering (60%) | ||
| rtk grep <pattern> # Search grouped by file (75%) | ||
| rtk rgai <query> # Semantic search ranked by relevance (85%) |
There was a problem hiding this comment.
rtk rgai <query> is documented here but the command does not exist in src/main.rs. Running this will crash with "unrecognized subcommand". The template should not reference commands that aren't implemented yet.
| elif echo "$MATCH_CMD" | grep -qE '^(rg|grep)[[:space:]]+'; then | ||
| REWRITTEN="${ENV_PREFIX}$(echo "$CMD_BODY" | sed -E 's/^(rg|grep) /rtk grep /')" | ||
| elif echo "$MATCH_CMD" | grep -qE '^(grepai|rgai)[[:space:]]+search([[:space:]]|$)'; then | ||
| REWRITTEN="${ENV_PREFIX}$(echo "$CMD_BODY" | sed -E 's/^(grepai|rgai)[[:space:]]+search[[:space:]]+/rtk rgai /')" |
There was a problem hiding this comment.
This rewrite rule sends grepai search X to rtk rgai X, but rtk rgai is not a valid subcommand. The hook will silently transform a working command (grepai search) into a broken one (rtk rgai).
This rule should land together with the rtk rgai implementation.
| "rtk docker exec -it db psql" | ||
|
|
||
| test_rewrite "find (NOT rewritten — different arg format)" \ | ||
| test_rewrite "find with native args" \ |
There was a problem hiding this comment.
These three tests changed from expecting no rewrite (empty string) to expecting rewrites (rtk find, rtk tree, rtk wget). This is a behavior change unrelated to the rgai search policy — should be in a separate PR.
| @@ -48,6 +48,30 @@ With rtk: **~45,000 tokens** → **70% reduction** | |||
|
|
|||
| > Estimates based on medium-sized TypeScript/Rust projects. Actual savings vary by project size. | |||
|
|
|||
There was a problem hiding this comment.
"Source: internal migration benchmark artifacts (private repository)" — this benchmark is not reproducible or verifiable by contributors. Either publish the benchmark script/dataset or move this table to internal docs.
PR Split per Review FeedbackThank you @FlorianBruniaux for the thorough review. All three blockers addressed. This PR has been split into 3 focused PRs:
Blocker resolution
Merge order
This PR can be closed once the split PRs are reviewed. |
|
@FlorianBruniaux Thank you for such a thorough and constructive review. I'm relatively new to open-source collaboration, and this kind of detailed, well-structured feedback is exactly what helps me grow as a contributor. I've learned a lot from this — especially about keeping PRs atomic and never shipping docs ahead of implementation. |
Summary
This PR enforces a single RTK search policy across hooks, init templates, docs, and tests:
Search priority (mandatory):
rgai > rg > grepIt keeps
rtk grepas the exact/regex interface and does not introducertk rgas a public command.Token Efficiency (Single Table, Same Data)
Benchmark setup: same migration dataset, same 5 query intents, same run date (February 14, 2026), 3 methods:
standard greprtk greprtk rgai(grepai-backed semantic search)Fairness note:
rtk grepwas run with--max 120to align withgrep ... | head -n 120.grep(est_tokens)rtk grep(est_tokens)rtk rgai(est_tokens)rtk grepvsgreprtk rgaivsgrepgrep->rtk grep->rtk rgai)token_optquality_normTakeaway:
rtk grepreduces token footprint for exact/regex workflows.rtk rgaigives the strongest token reduction and best token-to-signal density for semantic discovery.Prerequisites for
rtk rgairtk rgaidepends on grepai.Required:
grepaiinstalled globallyOllama,LM Studio, orOpenAI)grepai init,grepai watch)If
grepaiis unavailable, usertk grep(rg -> grepinternal fallback).Security Compliance (SECURITY.md Workflow)
Layer 1: Automated security-check.yml
src/init.rsCargo.tomlor workflow modificationsLayer 2: Claude skill
/rtk-pr-security 118Layer 3: Manual review
init.rs) in workflow rules.Implementation Details
grepai/rgai search ...->rtk rgai ...rgai ...->rtk rgai ...rg ...->rtk grep ...grep ...->rtk grep ...rtk inittemplates now include mandatory policy andrtk rgai-first examples.--claude-mdflow now upserts existing RTK block (add/update/no-op/malformed-safe).Strangeness Tax Mitigation
Docs define a deterministic escalation ladder:
rtk rgaifor semantic discoveryrtk grepfor exact/regex narrowingrtk proxy ...for raw/native behaviorVerification
cargo test init:: -- --nocapturecargo test rgai -- --nocapturebash hooks/test-rtk-rewrite.shgrepvsrtk grepvsrtk rgaion same migration task