Skip to content

release: v0.2.1 — first-run experience reset#601

Merged
kokevidaurre merged 137 commits intomainfrom
develop
Mar 13, 2026
Merged

release: v0.2.1 — first-run experience reset#601
kokevidaurre merged 137 commits intomainfrom
develop

Conversation

@kokevidaurre
Copy link
Contributor

Summary

  • Version 0.2.1: Complete first-run experience reset for new users
  • squads init --yes works without gh CLI (warning, not error)
  • Default business brief: "autonomous execution + track big AI players" — interesting first-run output
  • Auto-commits scaffolding so agents can use git worktrees immediately
  • 4 core squads (research, company, intelligence, product) for every user
  • Getting Started shows squads run as autopilot entry point
  • Phase-ordered execution, role-based context, infra cleanup
  • 1732 tests passing, all CI checks green

Notable changes since v0.7.0

Test plan

  • Docker first-run test passes (Ubuntu 24.04 + Node 22)
  • 1732 unit tests passing
  • E2E first-run journey test passing
  • Smoke test passing
  • npm install smoke test passing

🤖 Generated with Claude Code

kokevidaurre and others added 30 commits February 21, 2026 12:32
Closes #342

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…351)

Prevents shell injection via crafted paths in background and watch
execution modes. Applies same escaping used in foreground mode (PR #324).

Adds shellEscape() helper that replaces single quotes with '\'' to
safely interpolate variables into single-quoted shell strings. Applied to:
- Watch mode: projectRoot, worktreeDir, branchName, logFile, pidFile
- Background mode: projectRoot, worktreeDir, branchName, logFile, pidFile
- Provider background mode: workDir, logFile, pidFile, provider args
- execSync worktree calls in foreground and provider modes

Closes #340

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
v0.6.2 released, 3 security P1 issue-solvers dispatched,
751 tests passing, Q1 goals 2/3 achieved.

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…339)

Closes #319

Added default .action(() => cmd.outputHelp()) to 7 parent commands
(env, kpi, feedback, session, trigger, approval, autonomous) so they
exit 0 instead of 1 when invoked without a subcommand. Matches the
pattern already used by memory, goal, deploy, and exec commands.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
…354)

Replace scattered console.log calls with the project's writeLine()
utility from src/lib/terminal.ts. This provides a single output
layer for consistent formatting and future output control.

- Convert 238 console.log calls to writeLine across 10 files
- Remove 8 debug/placeholder log statements from anthropic.ts
- Keep console.log only for JSON.stringify output (--json flags)
  and raw prompt piping — standard CLI patterns
- Reduction: 269 → 31 occurrences (88% decrease)
- Zero new TypeScript errors

Files: init.ts, deploy.ts, autonomous.ts, trigger.ts, approval.ts,
eval.ts, login.ts, cli.ts, anthropic.ts, update.ts

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Replace minimal README with comprehensive 331-line version covering:
- Quick start with real output examples
- Why Squads (4 differentiators)
- Provider table (7 LLM providers)
- Feature showcase (dashboard, memory, sessions, autonomous, hooks)
- Command reference (21 active commands, no removed ones)
- Project structure and configuration examples
- Development guide and tech stack
- Contributing and community links

References only current commands (memory write/read instead of learn,
env show instead of context, exec list instead of history).

🤖 Generated with [Agents Squads](https://agents-squads.com)

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Closes agents-squads/engineering#51

Removed the base64-obfuscated API key from source code and replaced
with SQUADS_TELEMETRY_KEY env var. Telemetry send is skipped when key
is not set. The exposed key must be rotated server-side separately.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #343

The daemon process was silently failing because Commander.js rejected
the unregistered --daemon CLI flag. Replace with SQUADS_DAEMON env var
to signal daemon mode, redirect child stdout/stderr to log file for
diagnosability, and show clear error when daemon fails to start.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
* feat(status): show milestones and open PRs from GitHub

squads status now queries GitHub API for real operational data:
- Milestone progress bars across product repos (cli, console, api)
- Open PRs targeting develop with repo and number

Replaces vanity-only output with actionable org health metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(status): discover repos dynamically from squad definitions

Replace hardcoded PRODUCT_REPOS array with dynamic discovery:
- Read `repo` field from each SQUAD.md frontmatter
- Deduplicate and pass to fetchOperationalStatus()
- GitHub org derived from squad config, not hardcoded
- Dynamic column widths based on actual repo names
- Show all open PRs (not just develop-targeted)

Any user's squads with `repo:` in SQUAD.md will show milestones + PRs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: rewrite CLAUDE.md as user-facing guide

Remove internal references, org names, and dev-specific content. Focus on
teaching users how to define squads, run agents, and monitor work. Git-provider
agnostic. Engineering standards now live in hq CLAUDE.md (internal only).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Closes #24

Converts ~50 static command imports to dynamic import() inside action
handlers. Only the invoked command's dependencies (pg, supabase, inquirer,
ora) are loaded, saving ~300ms+ on cold start.

Changes:
- All command handlers use dynamic import() in their .action() callbacks
- autoUpdateOnStartup skipped for --help/--version (instant response)
- register*Command imports kept static (needed for subcommand structure)
- Type-only import for SessionSummaryData (zero runtime cost)

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: manual
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
)

Closes #297

Show "squads dash" hints at key touchpoints:
- After successful foreground/background agent execution
- After lead session completion
- After parallel agent launch
- In squad detail status commands section

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: manual
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Breaks down the 350-line executeWithClaude into 6 focused functions:
- buildAgentEnv: consolidates 3x duplicated env construction
- logVerboseExecution: DRYs up verbose config logging (was 2x identical)
- createAgentWorktree: isolates Node.js worktree creation
- buildDetachedShellScript: shared shell script for watch/background
- prepareLogFiles: shared log directory setup
- executeForeground: foreground spawn + status tracking
- executeWatch: watch mode (background + tail)

executeWithClaude is now a ~80-line coordinator that delegates to
the appropriate mode function.

Closes #158

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
…dless flags

Closes #371

Two fixes for Google/Gemini provider execution:

1. Add --yolo flag to Gemini CLI args for headless auto-approval.
   Without this, Gemini denies all tool calls when running in background
   because it can't prompt for interactive confirmation.

2. Copy .agents directory into worktree and rewrite prompt paths.
   Gemini CLI sandboxes file access to its workspace directory.
   The prompt references agent definitions at the original project root,
   which Gemini blocks as "Path not in workspace". Now we copy .agents
   into the worktree and rewrite absolute paths so Gemini can resolve them.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #280

Implements `squads create <name>` that creates:
- .agents/squads/<name>/SQUAD.md (from template)
- .agents/squads/<name>/lead.md (starter agent)
- .agents/memory/<name>/lead/ (memory directory)

Supports --description, --goal, --model flags for non-interactive use,
and interactive prompts via inquirer when flags are omitted.
Includes --force for overwriting and --yes for CI/scripting.

Note: organization.yaml is not used — squads are discovered dynamically
via filesystem (squad-parser.ts findSquadsDir + listSquads).

11 tests covering directory creation, content, naming, overwrite
protection, and squad discoverability.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: manual
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #366

When --cloud is set, the CLI dispatches agent execution to the platform
API instead of running locally. Requires `squads login` session and
SQUADS_API_URL environment variable.

Flow:
- POST /agent-dispatch to create dispatch request
- Poll /agent-executions for status updates
- Display execution summary on completion

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Trigger: smart
Model: claude-opus-4-6

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Closes #316

Added 63 tests covering 2 of the 6 lib modules listed in the issue:
- setup-checks.ts (48 tests): providers registry, commandExists,
  isDockerRunning, checkDockerPrereqs, checkGhCli, checkGhPermissions,
  checkClaudeCli, checkProviderAuth, runPrereqChecks, runAuthChecks,
  displayCheckResults, attemptFix, waitForService
- local.ts (15 tests): getLocalEnvVars, formatLocalStatus,
  isLangfuseLocal, getLocalStackStatus

Co-authored-by: Squads Cloud Worker <cloud@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…urces (#382)

Closes #314. Adds 115 tests across 4 test files achieving 92% statement
coverage and 80% branch coverage on the dashboard module:

- dashboard-loader.test.ts: 16 tests for findDashboardsDir, listDashboards,
  loadDashboard, clearDashboardCache, loadAllDashboards, findDashboard
- dashboard-renderers.test.ts: 49 tests for formatValue (all formats),
  getThresholdColor, calculateColumnWidths, and renderView (all view types)
- dashboard-sources.test.ts: 31 tests for buildQuery, buildWhereClause,
  parseDateRange, and postgresSource stub
- dashboard-engine.test.ts: 19 tests for executeDashboard, renderDashboard,
  and showAvailableDashboards with mocked dependencies

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
…381)

Closes #51

Changes:
- db.test.ts: Enable 4 previously skipped baseline tests (saveBaseline,
  getLatestBaseline, getBaselineByName, listBaselines) — stubs are
  implemented, tests were incorrectly marked as not-yet-implemented
- sessions.test.ts: Add 30 new tests covering file-system operations:
  findAgentsDir, getSessionsDir, getHistoryFilePath, getActiveSessions,
  getSessionSummary, startSession, stopSession, updateHeartbeat,
  cleanupStaleSessions — all use temp dirs to avoid test pollution
  Also expanded detectSquad, detectAIProcessesFast, getLiveSessionSummaryFast

Total: 63 → 104 tests passing, 0 skipped

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Post-execution instructions (branch, commit, PR workflow) now loaded from
.agents/config/post-execution.md instead of inline template string in run.ts.
Separates prompt content from code. Same pattern as approval-instructions.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This reverts commit 9999f92700c02af522e15cae29097a60f249cf15.
…eck (#389)

* fix(ci): run CI on PRs to develop — quality gate for agent PRs

Agents create PRs targeting develop. Without CI on develop PRs,
broken code gets merged undetected. This is the #1 quality gap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(quality): pre-commit hook runs build + tests on source changes

Agents were committing broken code (e.g. #384: tests that fail on
import). Now any commit touching .ts/.tsx/.js files must pass both
`npm run build` and `npm run test` before the commit goes through.

This is the #1 quality gate — prevents slop at the source.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): align failing tests with implementation

- deploy.test: capture process.stdout.write instead of console.log
  (deployCommand uses writeLine which writes to stdout)
- eval.test: same stdout capture fix for JSON output test
- infra.test: use POSTGRES_PORT env var (default 5433) to match
  docker-compose pattern
- local.test: expect port 5432 in DATABASE_URL matching getLocalEnvVars()
- setup-checks.test: expect 'warning' (not 'missing') when Docker
  is not installed, matching checkDockerPrereqs() implementation
- Deleted verify-token.test.ts (tested nonexistent verifyToken export)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(agents): proper PR workflow — target develop, daemon env, auth check

- Post-execution: agents now open PRs targeting `develop` with structured body
- Daemon (autonomous.ts): unset CLAUDECODE env to allow nested claude sessions
- Auth check: downgrade missing credentials from block to warn (keychain auth)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(run): extract post-execution prompt to template file

Post-execution instructions (branch, commit, PR workflow) now loaded from
.agents/config/post-execution.md instead of inline template string.
Separates prompt content from code. Same pattern as approval-instructions.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add missing env-config.ts (imported by run.ts but never committed)
- Fix Commander action spread types with @ts-expect-error directives
- Add inquirer type declaration for create command

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…tines' (#392)

Regex only matched '## Routines' exactly, missing Engineering squad's
'## Growth Routines' header. Now matches any word before 'Routines'.

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Multi-agent conversation orchestration for squad runs:
- Lead briefs → scanners discover → workers execute → lead reviews → verifiers check
- Shared transcript between agents for context continuity
- Convergence detection (continuation signals beat convergence signals)
- Cost ceiling ($25 default) and max turns (20 default) safety limits
- --task flag for founder directives (replaces lead briefing)
- Transcript persistence to .agents/conversations/{squad}/

New files:
- src/lib/conversation.ts — types, transcript, agent classification, convergence
- src/lib/workflow.ts — turn execution, orchestration loop, transcript persistence

`squads run <squad>` now runs a full conversation instead of just the lead agent.
`squads run <squad> -a <agent>` still runs individual agents (unchanged).

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(auth): add verifyToken function and passing test suite

Closes #384

Adds verifyToken(token, apiUrl) to src/lib/auth.ts:
- Calls GET /auth/verify with Bearer token header
- Maps snake_case API response to camelCase (display_name→name, subscription_plan→plan)
- Returns null on non-ok responses, network errors, and timeouts/aborts
- 5-second abort timeout to prevent hanging

Creates test/verify-token.test.ts with all 6 specified tests:
1. Returns user data on 200 with snake_case→camelCase mapping
2. Returns null on non-ok response (e.g. 401)
3. Returns null on network error (silent)
4. Returns null on timeout/abort
5. Sends Bearer token in Authorization header
6. Builds correct URL from apiUrl param

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

* fix(auth): update verifyToken signature and response to match API spec

Revises the initial implementation based on actual API contract:
- Parameter order: verifyToken(apiUrl, token) — apiUrl first
- Endpoint: /auth/cli/verify (not /auth/verify)
- Response shape: { email, tenantId, tenantSlug, tenantName, status }
  mapping from snake_case { tenant_id, tenant_slug, tenant_name }
- Updates test/verify-token.test.ts to use vi.stubGlobal per-test
  with afterEach cleanup for better test isolation

All 6 tests pass.

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
* test(commands): add unit tests for goal and list commands

Adds 21 new tests covering:
- goal.test.ts (14 tests): goalSetCommand, goalListCommand,
  goalCompleteCommand, goalProgressCommand — including edge cases
  for invalid indexes, non-existent squads, metric annotations
- list.test.ts (7 tests): JSON output validation, agent counts,
  no-project error handling, table and agents view rendering

Partial fix for #47 — covers 2 of 19 untested command files.

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

* test: add unit tests for feedback and progress commands

Closes #47 (partial — 2 of 15 untested commands)

Added 19 tests covering:
- feedback: add, show, parse history, rating validation, learnings
- progress: start/complete tasks, display, verbose mode, task IDs

Co-Authored-By: engineering/issue-solver <engineering-issue-solver@agents-squads.com>

Agent: engineering/issue-solver
Squad: engineering
Model: claude-opus-4-6

---------

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
…ification

- classifyAgent now uses role descriptions from SQUAD.md (primary) with
  name-based fallback — no more regex substring collisions
- Strip **bold** markers from agent names in table parser
- Replace regex convergence/continuation signals with phrase matching
- "keychain auth" → "OAuth" in run output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- session.test.ts: 11 tests covering sessionStartCommand,
  sessionStopCommand, sessionHeartbeatCommand, and detectSquadCommand
  (start/stop/heartbeat lifecycle, quiet mode, missing .agents dir)
- learn.test.ts: 14 tests covering learnCommand, learnShowCommand,
  and learnSearchCommand (default squad, specific squad, fallback,
  category inference, tag extraction, search, filters)

Part of #47 — adds coverage for 2 more previously untested commands.

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

Co-authored-by: kokevidaurre <kokevidaurre@users.noreply.github.com>
agents-squads bot and others added 17 commits March 9, 2026 21:31
* fix(lint): eliminate 62 ESLint no-unused-vars warnings

Closes #550

Removes unused imports and renames unused variables across 10 files.
All no-unused-vars warnings reduced from 62 → 0.
Remaining 34 warnings are no-explicit-any (tracked in #551).

Files changed:
- daemon.ts: remove LoopState, SquadSignal, GhIssue, defaultState, getLastRunAge
- dashboard.ts: remove formatCostBar, fetchRateLimits
- doctor.ts: remove homedir, bold, icons imports
- health.ts: rename options → _options (unused param)
- run.ts: remove getPRsWithReviewFeedback, buildReviewTask, getBeliefsContext,
          homedir, checkNewPRs; use empty catch for fail-safe blocks
- stats.ts: remove getOutcomeRecords
- anthropic.ts: rename skillId → _skillId (stub functions)
- cognition.ts: remove execSync import
- outcomes.ts: rename unmergedPRs → _unmergedPRs
- workflow.ts: remove readFileSync, execAsync, promisify;
              rename model → _model, stderr → _stderr

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli
Model: claude-sonnet-4-6

* fix(lint): restore exec import and add callback types in workflow.ts

Re-adds `exec` import removed by lint cleanup (still used at line 194).
Adds explicit types to exec callback params to fix TS7006 errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #348

Cut verbose features prose (Dashboard, Memory System, Session Detection,
Autonomous Execution, Claude Code Integration JSON) and condensed
Configuration examples into a quick reference. Kept all command tables,
quick start, why squads, providers, project structure, and dev setup.

Result: clear getting started in <5 steps, links to docs for advanced
topics, all stale Claude Code hooks JSON removed.

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Professional README with clear usage, base squads, prerequisites.
Remove: old tarball, briefs/, .agents/, dashboard.png.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
* test(outcomes): add 37 tests for outcome tracking module

Closes #574

Covers:
- gradeExecution: all grade paths (A, B, C, D, F) with edge cases
- computeScorecard: empty records, period filtering, rate calculations
- computeAllScorecards: unique agent grouping, sort, persist
- getAgentQualityScore: minimum threshold, grade average
- getOutcomeScoreModifier: waste/merge/quality modifier thresholds
- recordArtifacts: dedup, no-repo guard, gh CLI failures
- pollOutcomes: PR/issue state transitions, age-out, graceful failures
- getScorecards / getOutcomeRecords: cached reads

Co-Authored-By: cli/issue-solver <cli-issue-solver@agents-squads.com>

Agent: cli/issue-solver
Squad: cli

* refactor(test): simplify outcomes.test.ts — extract makeOutcomes() helper

- Add makeOutcomes() helper to eliminate 10+ inline outcomes object copies
- Replace all { ...makeRecord().outcomes, ... } antipattern with makeOutcomes({ ... })
- Remove redundant mockReadFileSync.mockReturnValue() calls in modifier tests
  (setupStore() already sets the mock; duplicating it was confusing)
- Fix implicit test ordering dependency in bonus modifier test by using
  explicit setupStore() instead of relying on previous test's mock state

All 37 tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
Three changes to prevent test file collision during parallel execution:

1. pool: 'forks' in vitest.config.ts — each file runs in its own process,
   preventing GIT_DIR env var deletions in one file's beforeAll from
   leaking into other concurrently-running test files.

2. HOME: cwd in runCli env — the CLI reads ~/.squads/config.json via
   os.homedir(). With HOME=cwd, each test's config writes go to an
   isolated temp path instead of the shared real home directory.

3. Math.random() suffix on all temp dir names — eliminates same-millisecond
   collision on machines where Date.now() has low resolution.

Closes #578

Co-Authored-By: Claude <noreply@anthropic.com>

Agent: cli/issue-solver
Squad: cli

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Adds test/commands/run.test.ts covering the exported runCommand and
runSquadCommand functions with 22 unit tests across 5 describe blocks:
- no squads directory (3 tests)
- --cloud flag handling (5 tests)
- target not found error paths (5 tests)
- preflight provider CLI check (3 tests)
- slash syntax parsing (2 tests)
- runSquadCommand delegation (3 tests)

All tests mock fs, child_process, and all lib modules. Uses
SQUADS_SKIP_CHECKS=1 to bypass preflight where needed.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
#566 (#591)

Closes #566

Covers:
- hasUnresolvedEscalation: blocked/needs-human label detection, fail-open on gh errors
- scoreSquads: PAUSED signal (score=0) for blocked squads, normal scoring for clean squads
- classifyRunOutcome: failed/skipped/completed paths
- checkCooldown: cooldown enforcement and expiry
- defaultState: initial state structure

The escalation pause logic was already implemented in squad-loop.ts.
This PR adds the missing test coverage to lock in the behavior.

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
…culateSquadCostProjections, and plan detection (#592)

- formatCostBar: 7 tests — boundary values, clamping, rounding, width
- calculateROIMetrics: 9 tests — cost per goal/commit/PR, ROI multiplier, projections, null handling
- calculateSquadCostProjections: 5 tests — null/empty input, multi-squad projections, weekly/monthly ratios
- detectPlan/getPlanType/isMaxPlan/getPlanDescription: 17 tests — all env signal paths, priority ordering, descriptions

Closes #556

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
* feat(run): inject SYSTEM.md as immutable Layer 1 in agent prompt

- Add loadSystemProtocol() to run-context.ts — reads .agents/SYSTEM.md
- Inject SYSTEM.md first in prompt, marked as [IMMUTABLE — NEVER OVERRIDE]
- Skip loadApprovalInstructions() and loadPostExecution() when SYSTEM.md loaded
- Mark both deprecated functions with @deprecated JSDoc
- Graceful fallback: if no SYSTEM.md, legacy config files still used

Closes #585

Co-Authored-By: Claude <noreply@anthropic.com>

* refactor(run-context): extract readAgentsFile helper, fix abstraction boundary

- Extract readAgentsFile() to eliminate boilerplate across 3 loaders
- loadSystemProtocol() returns raw content; caller wraps with [IMMUTABLE] markers
- loadApprovalInstructions() uses shared helper (removes duplicate find+read+warn)
- loadPostExecution() uses shared helper (removes duplicate find+read+warn)
- Add verbose log when SYSTEM.md absent (was silent)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
* feat(context): phase-ordered execution with role-based context cascade

Implement the full context overhaul for autonomous agent orchestration:

- Role-based context (scanner/worker/lead/coo) with token budgets and
  section gating — scanners get 3 sections, leads get all 10
- Topological phase ordering via depends_on in SQUAD.md frontmatter
  (Kahn's algorithm, cycle detection, wildcard support)
- --phased flag for autopilot: sequential phases, parallel within
- Context injection into conversation mode (workflow.ts)
- CLAUDECODE env var properly stripped via destructuring (not empty string)
- Backward-compatible: priorities.md falls back to goals.md, --phased is opt-in
- Pre-commit hook: remove false positives for documented file conventions
- README updated with "How Agents Think" context philosophy section
- All 1608 tests pass

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(eval): default post-run COO evaluation after every squad run

After any squad or agent execution, the COO (company-lead) automatically
evaluates outputs against priorities and directives. Generates feedback.md
and active-work.md per squad. Cross-squad assessment for multi-squad runs.

- Single squad: `squads run engineering` → eval engineering
- Single agent: `squads run engineering/issue-solver` → eval engineering
- Autopilot cycle: eval all dispatched squads after each cycle
- Skip recursion: company squad runs don't trigger self-evaluation
- Opt-out: `--no-eval` flag to skip evaluation

Co-Authored-By: Claude <noreply@anthropic.com>

* feat(eval): trajectory-aware evaluation with history tracking

The COO evaluation now:
- Reads previous feedback.md FIRST as baseline (not blank-slate)
- Measures direction (improving/stable/declining), not just position
- Maintains a history table (last 10 grades) so trajectory is visible
- Uses 7-day GitHub window (not 24h) for meaningful trend analysis
- Cross-squad review written to cross-squad-review.md
- Explicit: "C improving from F > B declining from A"

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(eval): inject --task directive into runAgent prompt

The task option was only wired into conversation mode, not single-agent
execution. COO evaluation task was being silently dropped. Now injected
as a TASK DIRECTIVE section that overrides default agent behavior.

Co-Authored-By: Claude <noreply@anthropic.com>

* docs(readme): rewrite intro with architectural vision

Position Squads as OS-native alternative to framework-based agent tools.
Lead with the three architectural pillars: native LLM CLIs as runtime,
filesystem as memory, GitHub as message bus.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(run): cleanup worktrees after agent execution

Worktrees were created per execution but never removed, accumulating
120+ stale directories. Now cleanup runs in all execution paths:
foreground, background, detached, and non-Anthropic providers.

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: remove hardcoded localhost URLs and Docker references from CLI

Source changes:
- Replace process.env.SQUADS_BRIDGE_URL || 'http://localhost:8088' with
  getEnv().bridge_url in sync.ts and autonomy.ts
- Replace "Is the bridge running?" and "squads stack up" messages with
  "squads login" guidance

Test updates (27 tests across 6 files):
- env-config: default environment is now 'prod', local env has empty URLs
- services: checkServiceAvailable uses fetch health checks, not Docker
- local: new service interface (url instead of port/healthUrl/configPath)
- setup-checks: Docker is optional (always returns 'ok')
- templates/memory: match new interfaces and env requirements

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(lint): use const for non-reassigned Set binding in computePhases

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: remove Docker/infrastructure leaks from all source files

Complete cleanup of internal infrastructure references from OSS CLI:
- services.ts: remove Docker container checks, use fetch-based health
- local.ts: remove port/healthUrl/configPath, use simple URL-based status
- env-config.ts: default to 'prod' environment, local env uses empty URLs
- setup-checks.ts: Docker is optional (always returns 'ok')
- health.ts, memory.ts, deploy.ts, costs.ts: graceful degradation
- cli.ts, approval.ts, context-feed.ts, cost.ts, history.ts, trigger.ts,
  stack-config.ts, telemetry.ts: minor cleanup

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(test): set SQUADS_BRIDGE_URL in memory search/extract tests

Tests for memorySearchCommand and memoryExtractCommand need bridge URL
set to avoid early return from getBridgeUrl() empty check.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
- Fix ReferenceError: provider is not defined in executeWithClaude
- Add missing outcomes.ts, insights.ts, stats.ts from develop
- Every squads run crashed on v0.7.0, D1 retention effectively 0%

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
The pre-commit hook contained client names, personal emails, and
financial terms in plaintext — on a public repo. Moved blocked
patterns to .husky/.blocked-patterns (gitignored, local-only).
Hook reads from external file if present, skips PII check if not.

Co-Authored-By: Claude <noreply@anthropic.com>
Hook contained client names and personal data on a public repo.
Commenting out entirely until proper local-only solution is in place.

Co-Authored-By: Claude <noreply@anthropic.com>
Symlink was accidentally committed. Already in .gitignore.

Co-Authored-By: Claude <noreply@anthropic.com>
* feat: reset first-run experience — 4 starter squads, context cascade, placeholder sentinel

- Default init creates 4 core squads (intelligence, research, product, company) with 14 agents
- Add SYSTEM.md (Layer 0) and directives.md (Layer 3) to context cascade
- BUSINESS_BRIEF.md has PLACEHOLDER sentinel — agents stop and ask user to edit instead of producing garbage
- Upgrade all agent templates with structured output formats and quality rules
- Add product squad (lead, scanner, worker) with depends_on intelligence/research
- Rename create → add, remove list (use status), add --pack option to init
- Clean repo root: remove changeset, husky, changelog, scripts, telemetry docs
- Add AGENTS.md (vendor-neutral) and rewrite CLAUDE.md with cascade documentation
- Update README with 4-step customization guide (Brief → Directives → Goals → Priorities)
- Fix broken command references in company agents (squads feedback add → real commands)

Co-Authored-By: Claude <noreply@anthropic.com>

* test: add first-run content verification — 4 squads, 14 agents, cascade files, sentinel

Verifies init creates exactly the promised scaffolding:
- 4 core squads (company, intelligence, product, research)
- 14 agent definition files
- SYSTEM.md, BUSINESS_BRIEF.md, directives.md cascade files
- PLACEHOLDER sentinel in BUSINESS_BRIEF.md
- CLAUDE.md and AGENTS.md at repo root

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: default business brief about AI agents, conditional PLACEHOLDER sentinel

- --yes default: "A startup using AI agents to automate business operations"
- Research focus: "AI agent and automation market" — produces interesting output on first run
- PLACEHOLDER sentinel only appears when user skips description in interactive mode
- Test updated: --yes brief has real content, no sentinel

Co-Authored-By: Claude <noreply@anthropic.com>

* feat: v0.2.1 first-run fixes — gh warning, auto-commit, business brief, squads run

- gh CLI: change from hard error to warning (not required for init/run)
- gh CLI: fix install hint from `brew install gh` to https://cli.github.com
- Remove duplicate Claude CLI check in runAuthChecks
- Auto-commit scaffolding on init (agents need HEAD for worktrees)
- Default business brief: "autonomous execution" + track big AI players
- Getting Started step 3: `squads run` autopilot (replaces `squads dash`)
- Add Docker first-run test environment
- Version bump to 0.2.1

Co-Authored-By: Claude <noreply@anthropic.com>

* fix: lint error — prefer-const for useCaseConfig

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(ci): update smoke tests for 4-squad init

- Inline smoke test: check 4 core squads (not engineering/marketing/ops)
- Inline smoke test: verify SYSTEM.md, directives.md, AGENTS.md
- Inline smoke test: fix state file paths (research/lead not researcher)
- Restore e2e-smoke.sh (was accidentally deleted)
- Replace `squads list` with `squads status` in smoke script

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Jorge Vidaurre <jorge@agents-squads.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…elease

Also adds permissions: contents: read to ci.yml (CodeQL fix)

Co-Authored-By: Claude <noreply@anthropic.com>
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a foundational update to the squads-cli, primarily focusing on a significantly improved first-run experience and the introduction of a local-first cognition engine. It streamlines the core command-line interface, consolidates autonomous operation logic, and enhances how AI agents receive context and learn from their actions. The changes aim to make the system more intuitive for new users and more intelligent in its continuous operation.

Highlights

  • First-Run Experience Overhaul: Completely reworked the initial user setup, including a new squads init flow that now auto-commits scaffolding and works gracefully without the gh CLI.
  • Cognition Engine Integration: Introduced a local-first cognition engine (src/lib/cognition.ts) enabling agents to learn, synthesize signals, evaluate decisions, and reflect on outcomes.
  • Autopilot Mode Enhancement: The squads run command now serves as the primary entry point for autopilot mode, supporting phase-ordered execution based on squad dependencies and role-based context injection.
  • New AI Workforce Intelligence Command: Added a squads stats command to provide executive-level insights, scorecards, and ROI metrics for the AI workforce.
  • Documentation and Template Refinement: Extensively updated README.md, AGENTS.md, and seed templates to improve clarity, agent guidance, and reflect the new architecture and first-run experience.
  • Infrastructure and Command Streamlining: Removed several legacy commands (list, stack, cron, tonight, live, env list, env activate) and infrastructure-related files, simplifying the local setup and focusing on cloud-native integration.
  • E2E Testing for User Journey: Implemented a new E2E smoke test (scripts/e2e-smoke.sh) to validate the complete first-run user journey in an isolated environment.
Changelog
  • .changeset/config.json
    • Removed Changeset configuration file.
  • .changeset/github-app-bot.md
    • Removed changelog entry for GitHub App bot identity.
  • CHANGELOG.md
    • Removed the project's changelog file, indicating a shift in changelog management.
  • scripts/release.sh
    • Removed the release automation script.
  • scripts/update-docs-changelog.cjs
    • Removed the script for updating the documentation changelog.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/ci.yml
Activity
  • The first-run experience has been completely reset for new users.
  • squads init --yes now functions without requiring the gh CLI, issuing a warning instead of an error.
  • The default business brief is set to 'autonomous execution + track big AI players', designed to produce interesting initial output.
  • Scaffolding is now auto-committed, allowing agents to immediately utilize git worktrees.
  • Four core squads (research, company, intelligence, product) are now provided to every user by default.
  • The 'Getting Started' guide highlights squads run as the autopilot entry point.
  • Execution now features phase-ordered processing, role-based context injection, and general infrastructure cleanup.
  • All 1732 existing unit tests are passing, and all CI checks are green.
  • Over 200 new tests have been added to the codebase.
  • Security enhancements include PII removal and hook cleanup.
  • The README.md has been rewritten for improved clarity and comprehensiveness.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@kokevidaurre kokevidaurre disabled auto-merge March 13, 2026 19:21
@kokevidaurre kokevidaurre merged commit 4b59a8c into main Mar 13, 2026
18 checks passed
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a massive and impressive pull request that fundamentally resets the project's first-run experience and internal architecture. The changes are centered around simplifying the user setup, providing a more guided initial experience with default squads, and introducing a sophisticated intelligence layer for autonomous operation and performance tracking.

Key improvements include:

  • A unified squads run command that replaces the old daemon and serves as the single entry point for both manual and autonomous execution.
  • A much richer scaffolding from squads init, providing four core squads out-of-the-box.
  • A new role-based, layered context injection system (run-context.ts) that should significantly improve agent performance and relevance.
  • The introduction of an intelligence layer with outcome tracking (outcomes.ts), cognition (cognition.ts), and insights (insights.ts), which is a major step towards a self-improving system.
  • Significant refactoring that moves complex logic out of command files into dedicated library modules (squad-loop.ts, run-context.ts), improving maintainability.
  • Removal of Docker as a hard dependency, making the tool more accessible.

The overall direction is very strong. My feedback is focused on a few areas of inconsistency and potential robustness improvements that came up during this large-scale refactoring.

I am having trouble creating individual review comments. Click here to see my feedback.

package-lock.json (3)

critical

The version in package-lock.json (0.7.1) is inconsistent with the version in package.json (0.2.1). This can lead to dependency and publishing issues. Please run npm install again to regenerate the lockfile with the correct version.

src/commands/run.ts (1417-1612)

high

The main while (running) loop in runAutopilot lacks a top-level try...catch block. An unhandled exception in any of the awaited functions within the loop (e.g., scoreSquads, runCognitionCycle) could crash the entire autopilot process. For a long-running daemon, it would be more robust to wrap the loop's body in a try...catch to log the error and continue to the next cycle.

CLAUDE.md (81)

medium

This document mentions that PRs should be made to the develop branch. However, other parts of the repository and historical context (like the previous release script) seem to point to a main-based workflow. It would be good to clarify the branching strategy and ensure it's consistent across all contribution guides to avoid confusion.

src/cli.ts (268-269)

medium

For consistency and better maintainability, since the command has been renamed from create to add, it would be clearer to also rename the source file src/commands/create.js to src/commands/add.js and the function createCommand to addCommand.

src/commands/list.ts (68)

medium

The suggested command squads run research/researcher refers to an agent that seems to have been removed in this PR. The new lead agent for the research squad appears to be lead. Please update the suggestion to squads run research/lead.

    writeLine(`  ${colors.dim}  squads run research/lead${RESET}`);

src/lib/anthropic.ts (202)

medium

The implementation for deleteSkill appears to have been removed, leaving it as an empty function. If this is intentional, consider adding a // TODO comment explaining why it's stubbed out. If it's no longer used, it might be better to remove the function entirely to avoid confusion.

src/lib/anthropic.ts (217)

medium

Similar to deleteSkill, the implementation for getSkill seems to be missing. It currently just returns null. If this is a placeholder for future implementation, a // TODO comment would be beneficial. Otherwise, if the function is obsolete, it should be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant