Skip to content

Conversation

@getworken
Copy link
Owner

@getworken getworken commented Jan 28, 2026

Summary by CodeRabbit

  • New Features

    • Project management: import existing projects, reset project state (quick or full), open projects directly in IDEs (VS Code, Cursor, Antigravity)
    • Knowledge files system: manage project-specific documentation files
    • Database health checks and infrastructure verification
    • Enhanced security tracking with Basic Auth support
    • IDE preference settings
    • Quality gates verification system
    • Structured logging and improved error handling with error boundary
  • Improvements

    • Better rate limiting detection and retry handling
    • WebSocket connection stability improvements
    • Enhanced project validation and settings management

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive infrastructure and feature management enhancements, including mandatory database persistence requirements in spec creation, a complete database layer reorganization with SQLAlchemy models and connection management, new server endpoints for knowledge files and IDE integration, enhanced security and authentication utilities, session event tracking with rate-limit handling in agents, and extensive testing infrastructure alongside frontend UI improvements for IDE selection and project management.

Changes

Cohort / File(s) Summary
Spec & Documentation Templates
.claude/commands/create-spec.md, .claude/commands/expand-project.md, .claude/templates/coding_prompt.template.md, .claude/templates/initializer_prompt.template.md, .claude/templates/testing_prompt.template.md, CLAUDE.md
Introduces Phase 3b Database Requirements (mandatory when database used), augments feature counts to include 5 infrastructure features, expands feature creation workflow with bulk JSON examples, adds XML fallback for feature creation, enhances refactoring feature handling, adds mock data detection and persistence verification steps, clarifies feature precedence and mandatory categorization, and expands token efficiency guidance.
Configuration & Logging Setup
api/config.py, api/logging_config.py, api/agent_types.py, registry.py
Introduces AutocoderConfig with centralized Pydantic-based settings management and reload capabilities, adds AgentType enum for agent categorization, provides setup_logging and get_logger utilities with rotating file handlers, adds Ollama model support alongside Claude models with CLAUDE_MODELS and OLLAMA_MODELS.
Database Layer Refactoring
api/models.py, api/connection.py, api/migrations.py, api/feature_repository.py, api/database.py
Consolidates ORM models (Feature, FeatureAttempt, FeatureError, Schedule, ScheduleOverride), introduces robust SQLite connection management with WAL and pragma configuration, provides transaction retry logic with exponential backoff, implements FeatureRepository pattern with CRUD and dependency operations, and establishes comprehensive migrations system for schema evolution.
Rate Limiting & Session Tracking
rate_limit_utils.py, agent.py, progress.py
Adds parse_retry_after and is_rate_limit_error utilities, implements exponential backoff and rate-limit detection in agent sessions, emits session lifecycle events (session_started, session_ended), integrates execute_with_retry and robust database connections for resilience.
Server Endpoints & API Expansion
server/routers/projects.py, server/routers/assistant_chat.py, server/routers/spec_creation.py, server/routers/terminal.py, server/routers/expand_project.py, server/routers/schedules.py, server/schemas.py
Adds knowledge file management endpoints (GET/POST/DELETE for knowledge), introduces project import/reset endpoints, adds database health check, implements IDE open endpoint, consolidates project name validation to central utility, adds WebSocket authentication gates, introduces DatabaseHealth, KnowledgeFile*, and DeniedCommand* schemas.
Security & Authentication
security.py, server/utils/auth.py, server/utils/validation.py
Introduces denied command tracking with thread-safe record/retrieve/clear operations, adds HTTP Basic Auth support with environment-driven credentials, implements WebSocket-aware authentication via headers or token parameters, centralizes project name validation with compiled regex patterns, enhances security hook with pre-validation and multi-layer checking.
Server Infrastructure & Middleware
server/main.py, server/services/assistant_chat_session.py, server/services/expand_chat_session.py, server/services/spec_chat_session.py
Adds rate limiting via SlowAPI, integrates HTTP Basic Auth middleware, adds startup/shutdown lifecycle for scheduler and cleanup, configures MCP (Machine Calculated Process) file-based config for feature management tools, defaults CLAUDE_CODE_MAX_OUTPUT_TOKENS for GLM 4.7 compatibility.
Orchestration & Process Management
parallel_orchestrator.py, autonomous_agent_demo.py, server/services/process_manager.py, server/services/dev_server_manager.py, server/services/terminal_manager.py, server/utils/process_utils.py
Adds Windows-safe asyncio handling with safe_asyncio_run and process group management, implements robust lock cleanup with stale/zombie process handling, adds Windows-specific process tree termination via taskkill, introduces cleanup_orphaned_agent_processes, enhances subprocess management with proper cleanup and TTL-based housekeeping.
Logging & Metrics
structured_logging.py
Introduces StructuredLogger with SQLite-backed log storage, WAL mode, and thread-safe writes; provides LogQuery for filtering, searching, exporting logs; supports JSON/JSONL/CSV export and timeline/agent statistics; includes cleanup with configurable max entries.
API Exports & Consolidation
api/__init__.py
Expands public API surface to export AgentType, AutocoderConfig, FeatureRepository, FeatureAttempt, FeatureError, get_logger, setup_logging, get_config, reload_config, get_database_path.
Quality & Testing
quality_gates.py, tests/conftest.py, tests/test_*.py, pyproject.toml
Introduces quality_gates module with lint/type-check/custom script runners, implements auto-detection of Python (Ruff/Flake8) and JS (ESLint/Biome) tools, provides pytest fixtures for database, async client, mocking, and projects, adds comprehensive security/async/repository test suites with 1000+ LOC coverage, configures pytest-asyncio support.
Frontend UI Components & Hooks
ui/src/components/IDESelectionModal.tsx, ui/src/components/ErrorBoundary.tsx, ui/src/components/ResetProjectModal.tsx, ui/src/components/ProjectSetupRequired.tsx, ui/src/App.tsx, ui/src/hooks/useProjects.ts, ui/src/lib/api.ts
Adds IDE selection modal and integration, implements error boundary for runtime error handling, introduces project reset modal with quick/full options, adds project setup required component, extends API with knowledge file and IDE endpoints, adds useResetProject and useUpdateSettings hooks, introduces IDEType type for preferred IDE tracking.
Startup & Configuration
start_ui.py, start_ui.sh, start_ui.bat
Enhances npm dependency staleness detection with timestamp-based logic, adds early Python venv activation in shell script, removes pause in batch launcher for streamlined startup.
Miscellaneous Updates
client.py, requirements.txt, .github/workflows/ci.yml, README.md, mcp_server/feature_mcp.py
Updates environment variable handling for output tokens, switches dependency pins to compatible (~=) constraints, replaces security test command with pytest invocation, adds project assistant feature documentation, expands feature MCP with regression tracking and attempt/error management.
WebSocket & Tracking
server/websocket.py
Implements agent TTL-based cleanup to prevent memory leaks, adds last_activity tracking with periodic stale agent removal, consolidates validation to centralized utilities, strengthens exception handling for WebSocket callbacks with specific error types.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes


🐰 From my burrow, I spy a grand refactoring!
Database layers dance, authentication takes flight,
With SQLAlchemy's grace and validation alight,
Five infrastructure features stand strong as a wall,
This marvelous merge brings them all, brings them all!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Updated project with new changes' is extremely vague and generic, using non-descriptive terms that convey no meaningful information about the actual changeset. Replace with a specific, concise title that reflects the primary changes, such as 'Add infrastructure requirements to spec creation and expand feature management' or similar.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 96.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/my-updates

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Note

Docstrings generation - SKIPPED
Skipped regeneration as there are no new commits. Docstrings already generated for this pull request at #3.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

1 similar comment
@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Note

Unit test generation is an Early Access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

coderabbitai bot added a commit that referenced this pull request Jan 28, 2026
Docstrings generation was requested by @getworken.

* #2 (comment)

The following files were modified:

* `agent.py`
* `api/agent_types.py`
* `api/config.py`
* `api/connection.py`
* `api/dependency_resolver.py`
* `api/feature_repository.py`
* `api/logging_config.py`
* `api/migrations.py`
* `api/models.py`
* `autonomous_agent_demo.py`
* `client.py`
* `mcp_server/feature_mcp.py`
* `parallel_orchestrator.py`
* `progress.py`
* `quality_gates.py`
* `rate_limit_utils.py`
* `security.py`
* `server/main.py`
* `server/routers/agent.py`
* `server/routers/assistant_chat.py`
* `server/routers/devserver.py`
* `server/routers/features.py`
* `server/routers/filesystem.py`
* `server/routers/projects.py`
* `server/routers/schedules.py`
* `server/routers/settings.py`
* `server/routers/spec_creation.py`
* `server/routers/terminal.py`
* `server/schemas.py`
* `server/services/assistant_chat_session.py`
* `server/services/dev_server_manager.py`
* `server/services/expand_chat_session.py`
* `server/services/process_manager.py`
* `server/services/spec_chat_session.py`
* `server/services/terminal_manager.py`
* `server/utils/auth.py`
* `server/utils/process_utils.py`
* `server/utils/validation.py`
* `server/websocket.py`
* `start_ui.py`
* `structured_logging.py`
* `tests/conftest.py`
* `tests/test_async_examples.py`
* `tests/test_repository_and_config.py`
* `tests/test_security.py`
* `tests/test_security_integration.py`
* `ui/src/App.tsx`
* `ui/src/components/AssistantPanel.tsx`
* `ui/src/components/ConversationHistory.tsx`
* `ui/src/components/DebugLogViewer.tsx`
* `ui/src/components/IDESelectionModal.tsx`
* `ui/src/components/ProjectSelector.tsx`
* `ui/src/components/ProjectSetupRequired.tsx`
* `ui/src/components/ResetProjectModal.tsx`
* `ui/src/components/ScheduleModal.tsx`
* `ui/src/components/SettingsModal.tsx`
* `ui/src/components/ThemeSelector.tsx`
* `ui/src/hooks/useAssistantChat.ts`
* `ui/src/hooks/useConversations.ts`
* `ui/src/hooks/useProjects.ts`
* `ui/src/lib/api.ts`
@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Caution

The CodeRabbit agent's plans did not produce any file changes.

1 similar comment
@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Caution

The CodeRabbit agent's plans did not produce any file changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
server/routers/filesystem.py (1)

459-468: Missing normalization before path traversal check.

The directory name check for . and .. on line 464 operates on the raw input without NFKC normalization. An attacker could use Unicode fullwidth full stops (U+FF0E ) to bypass this check since .. would pass the literal .. check.

Apply normalize_name to the directory name before the security checks for consistency with the normalization applied elsewhere in this file.

Proposed fix
     # Validate directory name
     name = request.name.strip()
+    # Normalize to prevent Unicode bypass attacks
+    name = normalize_name(name)
     if not name:
         raise HTTPException(status_code=400, detail="Directory name cannot be empty")
server/utils/process_utils.py (1)

44-156: Taskkill fallback runs after parent termination, rendering it ineffective.

The current code (lines 144–156) calls taskkill /T /PID <pid> only after the parent has already been terminated or force-killed (lines 133–142). At that point, the parent PID no longer exists, and taskkill will fail with "ERROR: The process not found"—it cannot traverse a dead parent's process tree. The /T flag only works when the parent PID currently exists in the process table.

Move the taskkill fallback earlier, while the parent still exists:

♻️ Suggested adjustment
         gone, still_alive = psutil.wait_procs(children, timeout=timeout)
         result.children_terminated = len(gone)

         logger.debug(
             "Children after graceful wait: %d terminated, %d still alive",
             len(gone), len(still_alive)
         )

+        # On Windows, use taskkill while the parent still exists if any children remain
+        if IS_WINDOWS and still_alive:
+            _kill_windows_process_tree_taskkill(proc.pid)
+
         # Force kill any remaining children
         for child in still_alive:
             try:
                 logger.debug("Force-killing child PID %d", child.pid)
                 child.kill()
                 result.children_killed += 1
             except (psutil.NoSuchProcess, psutil.AccessDenied) as e:
                 logger.debug("Child PID %d gone during force-kill: %s", child.pid, e)

         if result.children_killed > 0:
             result.status = "partial"

         # Now terminate the parent
         logger.debug("Terminating parent PID %d", proc.pid)
         proc.terminate()
         try:
             proc.wait(timeout=timeout)
             logger.debug("Parent PID %d terminated gracefully", proc.pid)
         except subprocess.TimeoutExpired:
             logger.debug("Parent PID %d did not terminate, force-killing", proc.pid)
             proc.kill()
             proc.wait()
             result.parent_forcekilled = True
             result.status = "partial"

-        # On Windows, use taskkill as a final cleanup to catch any orphans
-        # that psutil may have missed (e.g., conhost.exe, deeply nested processes)
-        if IS_WINDOWS:
-            try:
-                remaining = psutil.Process(proc.pid).children(recursive=True)
-                if remaining:
-                    logger.warning(
-                        "Found %d remaining children after psutil cleanup, using taskkill",
-                        len(remaining)
-                    )
-                    _kill_windows_process_tree_taskkill(proc.pid)
-            except psutil.NoSuchProcess:
-                pass  # Parent already dead, good
server/routers/schedules.py (1)

135-145: Add max_concurrency to Schedule initialization.

ScheduleCreate includes max_concurrency with a default of 3, but the Schedule(...) constructor on lines 135-145 doesn't pass it, causing the user-provided value to be silently ignored.

Fix
         schedule = Schedule(
             project_name=project_name,
             start_time=data.start_time,
             duration_minutes=data.duration_minutes,
             days_of_week=data.days_of_week,
             enabled=data.enabled,
             yolo_mode=data.yolo_mode,
             model=data.model,
+            max_concurrency=data.max_concurrency,
         )
server/services/expand_chat_session.py (1)

187-249: Clean up temp files when client initialization fails.

If Claude client creation throws, the session exits without calling close(), leaving the settings/MCP config files behind.

🧹 Suggested cleanup on failure
         try:
             self.client = ClaudeSDKClient(
                 options=ClaudeAgentOptions(
@@
                 )
             )
             await self.client.__aenter__()
             self._client_entered = True
         except Exception:
             logger.exception("Failed to create Claude client")
+            # Clean up temp files created earlier in start()
+            if self._settings_file and self._settings_file.exists():
+                try:
+                    self._settings_file.unlink()
+                except Exception as e:
+                    logger.warning(f"Error removing settings file: {e}")
+                finally:
+                    self._settings_file = None
+            if self._mcp_config_file and self._mcp_config_file.exists():
+                try:
+                    self._mcp_config_file.unlink()
+                except Exception as e:
+                    logger.warning(f"Error removing MCP config file: {e}")
+                finally:
+                    self._mcp_config_file = None
             yield {
                 "type": "error",
                 "content": "Failed to initialize Claude"
             }
             return
.claude/commands/create-spec.md (1)

231-273: Clarify feature-count guidance for stateless apps.

Phase 3b allows a stateless path with no DB/persistence tests, but the ranges and example totals always say “includes 5 infrastructure.” That can mislead users into adding infra features even when database is explicitly “none.” Consider conditional wording or a stateless alternative.

💡 Suggested wording tweak
- - **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure)
- - **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure)
- - **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure)
+ - **Simple apps** (todo list, calculator, notes): ~25-55 features (includes 5 infrastructure when a database is required)
+ - **Medium apps** (blog, task manager with auth): ~105 features (includes 5 infrastructure when a database is required)
+ - **Advanced apps** (e-commerce, CRM, full SaaS): ~155-205 features (includes 5 infrastructure when a database is required)

- > - **Infrastructure (required)**: 5 features (database setup, persistence verification)
+ > - **Infrastructure (required when database is needed)**: 5 features (database setup, persistence verification)

- > **Total: ~N features** (including 5 infrastructure)
+ > **Total: ~N features** (including infrastructure when applicable)
🤖 Fix all issues with AI agents
In @.claude/commands/expand-project.md:
- Around line 173-190: The fallback currently embeds a JSON array inside an XML
tag (<features_to_create>), which is invalid; replace the hybrid with a
correctly structured XML payload by converting the JSON objects into <feature>
elements (with child tags <category>, <name>, <description>, and a <steps>
container with multiple <step> entries) inside <features_to_create>, or—if the
system expects raw JSON—remove the <features_to_create> wrapper and document the
fallback as JSON; update the documentation text to explicitly state which format
the parser expects and show the corresponding valid example using either the
<features_to_create> XML structure or a plain JSON array.

In `@agent.py`:
- Around line 182-205: The current parent-process detection calls
clear_stuck_features() in both ImportError and general Exception handlers,
reintroducing the parallel-mode race; change the logic so
clear_stuck_features(project_dir) is only invoked when psutil import and parent
detection succeed and the parent_name check confirms we're not running under the
orchestrator (i.e., "python" not in parent_name.lower()); on ImportError or any
exception during detection, do NOT call clear_stuck_features — instead skip
clearing (optionally log a warning) so we avoid unsafe clearing when detection
fails; update the block around the psutil import/parent_process/name check and
the try/except handlers (references: psutil, parent_process, parent_name,
clear_stuck_features) accordingly.

In `@client.py`:
- Around line 47-51: The unconditional DEFAULT_MAX_OUTPUT_TOKENS = "131072"
should be set only when an alternative Anthropic-compatible API is configured;
change the assignment to check os.getenv("ANTHROPIC_BASE_URL") (or equivalent
config) and only set DEFAULT_MAX_OUTPUT_TOKENS to "131072" when
ANTHROPIC_BASE_URL is present, otherwise set a safer default (or leave
unset/None) to avoid exceeding standard Anthropic model limits; apply this same
guarded assignment change for DEFAULT_MAX_OUTPUT_TOKENS in client.py and the
same lines in server/services/spec_chat_session.py,
server/services/expand_chat_session.py, and
server/services/assistant_chat_session.py, referencing the
DEFAULT_MAX_OUTPUT_TOKENS constant and the ANTHROPIC_BASE_URL environment/config
variable.

In `@registry.py`:
- Around line 31-56: VALID_MODELS is currently built from CLAUDE_MODELS and
won’t include Ollama IDs when AVAILABLE_MODELS is swapped; update the code so
VALID_MODELS is derived from AVAILABLE_MODELS (or recomputed whenever
AVAILABLE_MODELS changes) instead of CLAUDE_MODELS. Locate the constants
VALID_MODELS and AVAILABLE_MODELS and change the list comprehension that builds
VALID_MODELS to use AVAILABLE_MODELS (or add a small helper/function to refresh
VALID_MODELS after detecting Ollama) so validators accept the actual active
model IDs; ensure DEFAULT_OLLAMA_MODEL and DEFAULT_MODEL remain unchanged.

In `@security.py`:
- Around line 68-85: Replace the partial previews in the logging block so no
parts of command/reason are logged: stop calling redact_string for previews and
instead log only the deterministic hashes (command_hash, reason_hash) and the
lengths (len(command), len(reason)) or a fixed masked string; update the
logger.info call to include only those hashes/lengths or fully masked
placeholders and remove any trailing whitespace on blank lines in this block
(affecting the lines around command_hash/reason_hash, redact_string, and
logger.info) to fix CI W293.

In `@server/routers/projects.py`:
- Around line 710-734: The upload_knowledge_file endpoint uses file.filename
directly to build filepath and is vulnerable to path traversal; validate and
sanitize the filename before writing: ensure filename is non-empty, contains no
path separators or "..", and is not absolute (you can get a safe name via
pathlib.Path(file.filename).name or os.path.basename and compare to the original
to detect tampering), reject names with null bytes or dangerous characters, then
construct filepath = knowledge_dir / safe_name and verify
filepath.resolve().is_relative_to(knowledge_dir.resolve()) (or compare parents)
before writing; on invalid names return HTTPException(status_code=400) and keep
the rest of upload_knowledge_file, KnowledgeFileUpload, get_knowledge_dir
references intact.

In `@server/services/expand_chat_session.py`:
- Around line 379-417: The code sets mcp_tool_succeeded = True as soon as a
ToolResult/ToolResultBlock with "feature_create_bulk" is seen, preventing the
XML fallback even when the tool errored or parsing failed; change the logic in
the ToolResult/ToolResultBlock branch to first check getattr(block, "is_error",
False) and skip processing if true, then attempt JSON parsing of content and
extract created_features, and only after successfully extracting non-empty
created_features (and appending ids to self.created_feature_ids and incrementing
self.features_created) set mcp_tool_succeeded = True; ensure any
JSONDecodeError/AttributeError keeps mcp_tool_succeeded False so the subsequent
XML fallback runs.

In `@server/services/process_manager.py`:
- Around line 241-268: The current logic unconditionally deletes the lock when
self.pid is None, which can remove a valid lock owned by a still-running
process; instead, when our_pid is None, read and parse the lock file into
lock_pid (as done below) and then verify liveness of lock_pid (e.g.,
os.kill(lock_pid, 0) on Unix or psutil.pid_exists/psutil.Process to check alive
and optionally inspect cmdline) before removing it; only call
self.lock_file.unlink(...) if the lock_pid is not alive or the process
command/identity does not match the expected agent, otherwise leave the lock in
place and log a warning.

In `@tests/test_security.py`:
- Around line 75-123: Tests like test_extract_commands and helpers like
check_hook currently only print and return values so pytest treats them as
passing; update each test_* function (e.g., test_extract_commands and any other
test_* in this file) to assert on failures instead of returning counts by adding
an assertion such as assert failed == 0 (or assert passed == expected_count) at
the end, and update check_hook to raise an AssertionError or return a boolean
used by an assert in callers (e.g., assert check_hook(cmd, should_block) is
True) so any mismatch fails the test run; ensure each test uses pytest-style
assertions rather than print+return.
🟡 Minor comments (19)
.github/workflows/ci.yml-21-22 (1)

21-22: Ensure the full test suite runs in CI.

The workflow currently runs only security tests (tests/test_security.py and tests/test_security_integration.py). No other test execution exists in the CI pipeline. Add a step to run the full test suite (or confirm if limited coverage is intentional).

start_ui.sh-33-37 (1)

33-37: Define SCRIPT_DIR before using it.
$SCRIPT_DIR isn’t set, so the pre-activation check is likely ineffective (or points to /venv).

🧩 Suggested fix
 cd "$(dirname "$0")"
+SCRIPT_DIR="$(pwd)"
 # AutoCoder UI Launcher for Unix/Linux/macOS
@@
 # Activate virtual environment if it exists
 if [ -d "$SCRIPT_DIR/venv" ]; then
.claude/templates/testing_prompt.template.md-24-27 (1)

24-27: Add a language tag to the fenced MCP block.
markdownlint MD040 expects a language identifier; text keeps it neutral here.

🛠️ Proposed fix
-```
+```text
 # 4. Get progress statistics
 Use the feature_get_stats tool
</details>

</blockquote></details>
<details>
<summary>server/main.py-168-182 (1)</summary><blockquote>

`168-182`: **Handle malformed base64 in Basic Auth decoding.**

`base64.b64decode()` can raise `binascii.Error` (and will for `validate=True`). Without catching it, malformed headers can bubble into 500s.

<details>
<summary>🛡️ Safer decode + explicit error handling</summary>

```diff
-import base64
+import base64
+import binascii
@@
-            decoded = base64.b64decode(encoded_credentials).decode("utf-8")
+            decoded = base64.b64decode(encoded_credentials, validate=True).decode("utf-8")
             username, password = decoded.split(":", 1)
@@
-        except (ValueError, UnicodeDecodeError):
+        except (ValueError, UnicodeDecodeError, binascii.Error):
             return Response(
                 status_code=401,
                 content="Invalid authorization header",
                 headers={"WWW-Authenticate": 'Basic realm="Autocoder"'},
             )
.claude/templates/coding_prompt.template.md-33-38 (1)

33-38: Add a language tag to the MCP tools snippet (MD040).

Line 35 triggers markdownlint; mark this fence as text or bash.

✏️ Suggested fix
-```
+```text
 # 5. Get progress statistics
 Use the feature_get_stats tool
</details>

</blockquote></details>
<details>
<summary>.claude/templates/coding_prompt.template.md-472-479 (1)</summary><blockquote>

`472-479`: **Align token‑efficiency guidance with the allowed tools list.**

The Token Efficiency section tells agents to use `feature_get_summary`, but the “ALLOWED Feature Tools” list doesn’t include it. Either add it to the allowed list or swap the guidance to an allowed tool (e.g., `feature_get_stats`).


<details>
<summary>✏️ Suggested fix</summary>

```diff
-- **Use `feature_get_summary`** for status checks (lighter than `feature_get_by_id`)
+- **Use `feature_get_stats`** for status checks (lighter than `feature_get_by_id`)
.claude/templates/initializer_prompt.template.md-45-78 (1)

45-78: Add a language tag to the feature_create_bulk example (MD040).

markdownlint flags Line 45. Add json (or text) to the fenced block.

✏️ Suggested fix
-```
+```json
 Use the feature_create_bulk tool with features=[
   {
     "category": "functional",
     "name": "Brief feature name",
security.py-9-18 (1)

9-18: Fix import ordering per CI (I001).

CI reports the import block is unsorted at Line 9. Reorder standard imports to satisfy the linter.

🧹 Suggested fix
-import logging
-import hashlib
+import hashlib
+import logging
 import os
 import re
 import shlex
 import threading
server/routers/projects.py-461-517 (1)

461-517: Inconsistent use of cmd vs cmd_path on Unix.

On Windows (line 507), the code correctly uses cmd_path (the resolved executable path). On Unix (line 510), it uses cmd (the command name string). While this may work due to start_new_session=True and shell PATH resolution, it's inconsistent and could cause issues if the executable isn't in PATH for the subprocess.

Proposed fix for consistency
         if sys.platform == "win32":
             subprocess.Popen([cmd_path, project_path])
         else:
             # Unix-like systems
-            subprocess.Popen([cmd, project_path], start_new_session=True)
+            subprocess.Popen([cmd_path, project_path], start_new_session=True)
mcp_server/feature_mcp.py-616-624 (1)

616-624: Inconsistent behavior with feature_mark_passing regarding last_failed_at.

When tested_ok=True, this function clears last_failed_at (line 619), but feature_mark_passing (lines 268-275) does not clear last_failed_at. This inconsistency could lead to confusion when querying feature history.

Consider aligning behavior: either both functions should clear last_failed_at when marking passing, or neither should.

mcp_server/feature_mcp.py-1484-1486 (1)

1484-1486: last_failed_at updated on any error log, even for non-failure scenarios.

Logging an error (e.g., lint_error) updates last_failed_at, but the feature might still be passing. This could misrepresent when the feature actually failed a test. Consider only updating last_failed_at when the error type indicates an actual feature failure (e.g., test_failure), or make this behavior opt-in via a parameter.

mcp_server/feature_mcp.py-614-614 (1)

614-614: Fix whitespace issues flagged by CI.

The CI pipeline flagged whitespace issues on this line and others (lines 625, 1002, 1025). Remove trailing whitespace and whitespace on blank lines.

structured_logging.py-436-439 (1)

436-439: Use datetime.now(timezone.utc) instead of deprecated datetime.utcnow().

datetime.utcnow() is deprecated in Python 3.12+. Use datetime.now(timezone.utc) for consistency with the rest of the codebase:

         # Default to last 24 hours
         if not since:
-            since = datetime.utcnow() - timedelta(hours=24)
+            since = datetime.now(timezone.utc) - timedelta(hours=24)
         if not until:
-            until = datetime.utcnow()
+            until = datetime.now(timezone.utc)
ui/src/components/ResetProjectModal.tsx-28-44 (1)

28-44: Block close interactions while a reset is pending.

Backdrop and the header X still close the modal while resetProject.isPending is true, which can hide in-flight status/errors.

🔒 Suggested guard
-    <div className="neo-modal-backdrop" onClick={onClose}>
+    <div className="neo-modal-backdrop" onClick={resetProject.isPending ? undefined : onClose}>
...
-          <button
-            onClick={onClose}
-            className="neo-btn neo-btn-ghost p-2"
-          >
+          <button
+            onClick={onClose}
+            disabled={resetProject.isPending}
+            className="neo-btn neo-btn-ghost p-2"
+          >
             <X size={24} />
           </button>
ui/src/components/ResetProjectModal.tsx-16-25 (1)

16-25: Handle non-throwing reset failures.

resetProject returns { success, message }; if success is false but HTTP 200, the modal closes without surfacing the failure.

🐛 Suggested handling
-      await resetProject.mutateAsync({ name: projectName, fullReset })
-      onReset?.()
-      onClose()
+      const result = await resetProject.mutateAsync({ name: projectName, fullReset })
+      if (!result.success) {
+        setError(result.message || 'Failed to reset project')
+        return
+      }
+      onReset?.()
+      onClose()
tests/test_security.py-10-31 (1)

10-31: CI formatting failures: import order + trailing whitespace.

The pipeline flagged I001 and W293. Please re‑run the formatter (or manually fix import ordering and whitespace-only blank lines).

Also applies to: 67-81

server/services/assistant_chat_session.py-246-256 (1)

246-256: Unused skip_greeting parameter.

The skip_greeting parameter is documented but never actually used in the method body. The greeting decision at line 369 is based on is_new_conversation, not skip_greeting. Either remove the parameter or implement the intended behavior.

Option 1: Remove unused parameter
-    async def start(self, skip_greeting: bool = False) -> AsyncGenerator[dict, None]:
+    async def start(self) -> AsyncGenerator[dict, None]:
         """
         Initialize session with the Claude client.

         Creates a new conversation if none exists, then sends an initial greeting.
         For resumed conversations, skips the greeting since history is loaded from DB.
         Yields message chunks as they stream in.
-
-        Args:
-            skip_greeting: If True, skip sending the greeting (for resuming conversations)
         """
Option 2: Use the parameter
         # Send initial greeting only for NEW conversations
         # Resumed conversations already have history loaded from the database
-        if is_new_conversation:
+        if is_new_conversation and not skip_greeting:
server/services/assistant_chat_session.py-304-307 (1)

304-307: MCP config file lacks unique naming and cleanup.

Unlike expand_chat_session.py which uses {uuid.uuid4().hex} in the filename to prevent conflicts between concurrent sessions, this implementation uses a fixed filename .claude_mcp_config.json. Additionally, the file is not tracked for cleanup in the close() method.

Suggested fix for unique naming and cleanup
+import uuid
+
 class AssistantChatSession:
     def __init__(self, project_name: str, project_dir: Path, conversation_id: Optional[int] = None):
         ...
         self._history_loaded: bool = False
+        self._mcp_config_file: Optional[Path] = None  # Track for cleanup

     async def close(self) -> None:
         """Clean up resources and close the Claude client."""
         if self.client and self._client_entered:
             try:
                 await self.client.__aexit__(None, None, None)
             except Exception as e:
                 logger.warning(f"Error closing Claude client: {e}")
             finally:
                 self._client_entered = False
                 self.client = None
+        # Clean up MCP config file
+        if self._mcp_config_file and self._mcp_config_file.exists():
+            try:
+                self._mcp_config_file.unlink()
+            except Exception as e:
+                logger.warning(f"Error removing MCP config file: {e}")

     async def start(self, skip_greeting: bool = False) -> AsyncGenerator[dict, None]:
         ...
-        mcp_config_file = self.project_dir / ".claude_mcp_config.json"
+        mcp_config_file = self.project_dir / f".claude_mcp_config.assistant.{uuid.uuid4().hex}.json"
+        self._mcp_config_file = mcp_config_file
api/logging_config.py-63-66 (1)

63-66: Race condition in _logging_configured flag check.

The check-then-act on _logging_configured is not thread-safe. If two threads call setup_logging() simultaneously, both could pass the check before either sets the flag, leading to duplicate handlers.

🔒 Suggested fix using threading.Lock
+import threading
+
 # Track if logging has been configured
 _logging_configured = False
+_logging_lock = threading.Lock()


 def setup_logging(
     log_dir: Optional[Path] = None,
     log_file: str = DEFAULT_LOG_FILE,
     console_level: int = DEFAULT_CONSOLE_LOG_LEVEL,
     file_level: int = DEFAULT_FILE_LOG_LEVEL,
     root_level: int = DEFAULT_LOG_LEVEL,
 ) -> None:
     ...
     global _logging_configured

-    if _logging_configured:
-        return
+    with _logging_lock:
+        if _logging_configured:
+            return
+        # ... rest of the function ...
+        _logging_configured = True
🧹 Nitpick comments (30)
ui/src/components/ProjectSetupRequired.tsx (2)

48-52: Simplify handleRetryInitializer by removing redundant status update.

Setting initializerStatus to 'idle' is unnecessary since handleSpecComplete immediately sets it to 'starting'. Also, consider making this function async for clarity.

♻️ Suggested refactor
-  const handleRetryInitializer = () => {
+  const handleRetryInitializer = async () => {
     setInitializerError(null)
-    setInitializerStatus('idle')
-    handleSpecComplete('', yoloModeSelected)
+    await handleSpecComplete('', yoloModeSelected)
   }

161-172: Consider adding ARIA attributes for better accessibility.

The error notification should announce to screen readers. Adding role="alert" ensures the error message is immediately announced when it appears.

♿ Suggested accessibility improvement
       {initializerError && (
-        <div className="mt-6 p-4 bg-[var(--color-neo-danger)] text-white border-3 border-[var(--color-neo-border)]">
+        <div 
+          role="alert"
+          className="mt-6 p-4 bg-[var(--color-neo-danger)] text-white border-3 border-[var(--color-neo-border)]"
+        >
           <p className="font-bold mb-2">Failed to start agent</p>
           <p className="text-sm">{initializerError}</p>
pyproject.toml (1)

19-28: Consider scoping DeprecationWarning filters to specific modules or tests.

Globally ignoring DeprecationWarning can mask important upgrade signals from dependencies. The asyncio_default_fixture_loop_scope option is compatible with the pinned pytest-asyncio version (0.24+, where this feature was introduced). However, applying warning filters selectively (by module or test marker) is safer and helps catch dependency upgrade issues early.

start_ui.py (1)

153-159: Guard node_modules checks against non-directory or unreadable paths.

node_modules.iterdir() will raise if the path is a file or unreadable, which aborts startup. Treat those cases as “needs install” to keep the launcher resilient.

♻️ Suggested hardening
-    if not node_modules.exists():
+    if not node_modules.exists() or not node_modules.is_dir():
         needs_install = True
-    elif not any(node_modules.iterdir()):
-        # Treat empty node_modules as stale (failed/partial install)
-        needs_install = True
-        print("  Note: node_modules is empty, reinstalling...")
+    else:
+        try:
+            if not any(node_modules.iterdir()):
+                # Treat empty node_modules as stale (failed/partial install)
+                needs_install = True
+                print("  Note: node_modules is empty, reinstalling...")
+        except OSError:
+            needs_install = True
+            print("  Note: node_modules is not accessible, reinstalling...")
ui/src/components/ErrorBoundary.tsx (1)

67-79: Consider adding an alert role for screen readers.
Since the fallback replaces the entire UI, a live region helps announce the error state to assistive tech.

♿ Optional tweak
-          <div className="max-w-2xl w-full bg-white dark:bg-gray-800 rounded-lg shadow-lg p-6">
+          <div
+            className="max-w-2xl w-full bg-white dark:bg-gray-800 rounded-lg shadow-lg p-6"
+            role="alert"
+            aria-live="assertive"
+          >
server/utils/auth.py (1)

82-104: Reject malformed base64 explicitly.

base64.b64decode() without validate=True can silently accept junk characters. Tightening this avoids ambiguous parsing.

🔍 Tighten base64 decoding
-            decoded = base64.b64decode(encoded).decode("utf-8")
+            decoded = base64.b64decode(encoded, validate=True).decode("utf-8")
@@
-            decoded = base64.b64decode(token).decode("utf-8")
+            decoded = base64.b64decode(token, validate=True).decode("utf-8")
server/routers/settings.py (1)

174-178: Consider returning the count of cleared commands.

The clear_denied_commands() function returns the count of cleared commands, but this value is discarded. Returning it in the response would provide useful feedback to the caller.

Proposed enhancement
 `@router.delete`("/denied-commands")
 async def clear_denied_commands_list():
     """Clear the denied commands history."""
-    clear_denied_commands()
-    return {"status": "cleared"}
+    count = clear_denied_commands()
+    return {"status": "cleared", "count": count}
quality_gates.py (2)

39-71: Move the time import to module level.

The import time statement inside _run_command should be moved to the top of the file with other imports. Function-level imports add overhead on each call and can make dependencies harder to track.

Proposed fix

At the top of the file (around line 18):

 from datetime import datetime
 from pathlib import Path
 from typing import TypedDict
+import time

In the function (line 51):

 def _run_command(cmd: list[str], cwd: Path, timeout: int = 60) -> tuple[int, str, int]:
     ...
-    import time
     start = time.time()

292-353: Consider using datetime.now(timezone.utc) instead of datetime.utcnow().

datetime.utcnow() is deprecated in Python 3.12+ and returns a naive datetime without timezone info. Using datetime.now(timezone.utc) returns a timezone-aware datetime and is the recommended approach.

Proposed fix

At the top of the file:

-from datetime import datetime
+from datetime import datetime, timezone

In verify_quality (line 350):

-        "timestamp": datetime.utcnow().isoformat(),
+        "timestamp": datetime.now(timezone.utc).isoformat(),
server/routers/projects.py (1)

641-675: Move the datetime import to module level.

The from datetime import datetime import on line 665 inside the function should be at the module level. This causes import overhead on every call and is inconsistent with Python best practices.

Proposed fix

At the top of the file with other imports:

 import re
 import shutil
 import subprocess
 import sys
+from datetime import datetime
 from pathlib import Path

Remove the inline import on line 665.

autonomous_agent_demo.py (1)

57-87: Extract safe_asyncio_run to a shared utility module.

This function is duplicated verbatim across autonomous_agent_demo.py (lines 57-87) and parallel_orchestrator.py (lines 93-123). Extract it to server/utils/async_utils.py (following the existing pattern of process_utils.py) to reduce maintenance burden and prevent future divergence.

Both files set the Windows event loop policy before imports, so ensure the policy is applied in the entry point (autonomous_agent_demo.py) before importing the new utility module.

mcp_server/feature_mcp.py (2)

48-50: Consider importing _utc_now from api.models instead of duplicating it.

This function is identical to the one defined in api/models.py (lines 27-29). Importing it would reduce duplication and ensure consistency.

-def _utc_now() -> datetime:
-    """Return current UTC time."""
-    return datetime.now(timezone.utc)
+from api.models import _utc_now

1398-1411: Consider consolidating statistics queries into a single aggregation.

The three separate COUNT queries (total, success, failure) could be combined into one query using conditional aggregation, similar to the pattern used in feature_get_stats:

from sqlalchemy import case, func
result = session.query(
    func.count(FeatureAttempt.id).label('total'),
    func.sum(case((FeatureAttempt.outcome == "success", 1), else_=0)).label('success'),
    func.sum(case((FeatureAttempt.outcome == "failure", 1), else_=0)).label('failure')
).filter(FeatureAttempt.feature_id == feature_id).first()
structured_logging.py (1)

360-362: LIKE search without escaping special characters.

The search parameter is used directly in a LIKE query. If the search term contains % or _, they'll be interpreted as wildcards. Consider escaping these characters if literal matching is intended:

if search:
    # Escape LIKE special characters for literal matching
    escaped = search.replace('\\', '\\\\').replace('%', '\\%').replace('_', '\\_')
    conditions.append("message LIKE ? ESCAPE '\\'")
    params.append(f"%{escaped}%")
ui/src/components/IDESelectionModal.tsx (1)

56-77: Add explicit radio semantics for the IDE options.

These buttons represent an exclusive choice; adding role="radiogroup" / role="radio" (plus type="button") improves screen-reader clarity.

♿ Suggested a11y tweak
-            <div className="space-y-2">
+            <div className="space-y-2" role="radiogroup" aria-label="IDE selection">
               {IDE_OPTIONS.map((ide) => (
                 <button
+                  type="button"
+                  role="radio"
+                  aria-checked={selectedIDE === ide.id}
                   key={ide.id}
                   onClick={() => setSelectedIDE(ide.id)}
ui/src/App.tsx (1)

242-269: Consider user-facing feedback for IDE open/save failures.

Right now errors only hit console.error; a toast/snackbar would help users understand why “Open in IDE” failed.

tests/conftest.py (2)

73-88: Database session fixture lacks proper transaction isolation.

The db_session fixture calls rollback() in finally, but since temp_db already committed data during database creation, a simple rollback won't provide true test isolation. Tests using this fixture that commit changes will affect subsequent tests.

Consider using nested transactions (savepoints) for proper isolation:

Suggested fix using savepoints
 `@pytest.fixture`
 def db_session(temp_db: Path):
     """Get a database session for testing.

-    Provides a session that is automatically rolled back after each test.
+    Provides a session wrapped in a transaction that is rolled back after each test.
     """
     from api.database import create_database

     _, SessionLocal = create_database(temp_db)
-    session = SessionLocal()
+    connection = SessionLocal().get_bind().connect()
+    transaction = connection.begin()
+    session = SessionLocal(bind=connection)

     try:
         yield session
     finally:
-        session.rollback()
+        transaction.rollback()
+        connection.close()
         session.close()

223-255: Redundant engine cache invalidation.

The populated_db fixture yields temp_db and then calls invalidate_engine_cache(temp_db) in cleanup. However, temp_db (which populated_db depends on) already invalidates the cache in its own cleanup. This double invalidation is harmless but unnecessary.

Remove redundant cleanup
 `@pytest.fixture`
 def populated_db(temp_db: Path, sample_feature_data: dict) -> Generator[Path, None, None]:
     """Create a database populated with sample features.

     Returns the project directory path.
     """
     from api.database import Feature, create_database, invalidate_engine_cache

     _, SessionLocal = create_database(temp_db)
     session = SessionLocal()

     try:
         # Add sample features
         for i in range(5):
             feature = Feature(
                 priority=i + 1,
                 category=f"category_{i % 2}",
                 name=f"Feature {i + 1}",
                 description=f"Description for feature {i + 1}",
                 steps=[f"Step {j}" for j in range(3)],
                 passes=i < 2,  # First 2 features are passing
                 in_progress=i == 2,  # Third feature is in progress
             )
             session.add(feature)

         session.commit()
     finally:
         session.close()

     yield temp_db
-
-    # Dispose cached engine to prevent file locks on Windows
-    invalidate_engine_cache(temp_db)
+    # Note: temp_db fixture handles engine cleanup
progress.py (1)

146-170: Inner exception handling is overly broad.

The except sqlite3.OperationalError at line 158 catches all operational errors, but the fallback query is specifically for missing in_progress column. Other operational errors (e.g., disk I/O errors) would incorrectly trigger the fallback instead of bubbling up.

Suggested fix to check error message
             except sqlite3.OperationalError as e:
-                # Fallback for databases without in_progress column
+                # Fallback only for databases without in_progress column
+                if "in_progress" not in str(e).lower() and "no such column" not in str(e).lower():
+                    raise  # Re-raise other operational errors
                 cursor.execute("""
                     SELECT
                         COUNT(*) as total,
tests/test_repository_and_config.py (2)

18-34: Consider using the db_session fixture for cleaner tests.

Each test method manually creates a session and wraps it in try/finally. The db_session fixture from conftest.py already handles this pattern. However, since the tests need the populated_db fixture rather than temp_db, this pattern is acceptable for now.


398-422: Tests manipulate internal module state.

The tests directly set api.config._config = None to reset the singleton, which couples tests to internal implementation details. If the codebase provides a public reload_config() function (which it does, per line 417), prefer using that consistently.

Suggested improvement
     def test_get_config_singleton(self, monkeypatch, tmp_path):
         """Test that get_config returns a singleton."""
-        # Note: get_config uses the default config loading, which reads .env
-        # This test just verifies the singleton pattern works
-        import api.config
-        api.config._config = None
-
-        from api.config import get_config
+        from api.config import get_config, reload_config
+        # Reset to ensure clean state
+        reload_config()
+        
         config1 = get_config()
         config2 = get_config()

         assert config1 is config2
api/feature_repository.py (2)

32-34: Duplicated _utc_now helper function.

This helper is duplicated in mcp_server/feature_mcp.py (lines 47-49 per relevant snippets). Consider extracting to a shared utilities module to maintain DRY principle.


225-245: mark_passing overwrites completed_at even for already-passing features.

Unlike mark_in_progress which guards against redundant updates (line 218), mark_passing will update completed_at even if the feature is already passing. This could lose the original completion timestamp if called multiple times.

Consider adding a guard condition
     def mark_passing(self, feature_id: int) -> Optional[Feature]:
         ...
         feature = self.get_by_id(feature_id)
-        if feature:
+        if feature and not feature.passes:
             feature.passes = True
             feature.in_progress = False
             feature.completed_at = _utc_now()
             _commit_with_retry(self.session)
             self.session.refresh(feature)
         return feature
server/websocket.py (2)

275-278: Fire-and-forget task without exception handling.

asyncio.create_task() creates a background task but exceptions raised within cleanup_stale_agents will be silently lost if not handled. This could mask errors during cleanup.

🔧 Suggested improvement to handle task exceptions
     def _schedule_cleanup(self) -> None:
         """Schedule cleanup if needed (non-blocking)."""
         if self._should_cleanup():
-            asyncio.create_task(self.cleanup_stale_agents())
+            task = asyncio.create_task(self.cleanup_stale_agents())
+            task.add_done_callback(lambda t: t.exception() if not t.cancelled() else None)

210-212: Redundant cleanup scheduling.

_schedule_cleanup() is called at line 197 inside the lock when state changes, and then again at line 211 unconditionally after the lock is released. The second call will always duplicate work since the time-based check (_should_cleanup) hasn't changed between line 197 and 211.

♻️ Remove redundant call
-        # Periodic cleanup of stale agents (every 5 minutes)
-        self._schedule_cleanup()
         return None
api/logging_config.py (1)

170-176: Late import of os module.

The os module is imported inside the function at line 171. While this works, it's unconventional and can cause confusion. Consider moving the import to the top of the file with other imports.

♻️ Move import to top of file
 import logging
+import os
 import sys
 from logging.handlers import RotatingFileHandler
 ...

 def setup_orchestrator_logging(...):
     ...
     # Log session start
-    import os
     logger.info("=" * 60)
tests/test_security_integration.py (2)

28-69: Code duplication with tests/test_security.py.

The temporary_home context manager is duplicated from tests/test_security.py (lines 33-71). Consider extracting this to a shared test utilities module to maintain DRY principles.

♻️ Extract to shared test utilities

Create a tests/conftest.py or tests/test_utils.py:

# tests/test_utils.py
import os
import sys
from contextlib import contextmanager

`@contextmanager`
def temporary_home(home_path):
    """Context manager to temporarily set HOME (and Windows equivalents)."""
    # ... implementation ...

Then import in both test files:

from tests.test_utils import temporary_home

420-424: Bare except Exception swallows context.

When a test raises an exception, the traceback import happens inside the except block. Consider importing traceback at the top of the file for cleaner code.

♻️ Move import to top of file
 import asyncio
 import os
 import sys
 import tempfile
+import traceback
 from contextlib import contextmanager
 ...

         except Exception as e:
             print(f"❌ FAIL: Test raised exception: {e}")
-            import traceback
-
             traceback.print_exc()
             failed += 1
api/connection.py (2)

419-436: get_db is a generator but lacks @contextmanager decorator.

The get_db function uses yield but is documented as a "dependency for FastAPI". For FastAPI dependencies, this is correct (FastAPI handles generator dependencies). However, the function cannot be used as a regular context manager without wrapping. Consider adding a docstring note about this limitation.

The current implementation works correctly for FastAPI's dependency injection system. No code change needed, but the docstring could clarify:

def get_db() -> Session:
    """
    Dependency for FastAPI to get database session.

    Note: This is a generator dependency for FastAPI's DI system.
    For standalone usage, use get_db_session() context manager instead.
    ...
    """

43-91: Network path detection may have false negatives.

The /proc/mounts parsing checks if path_str.startswith(mount_point), but this could fail to match if the mount point has a trailing slash or if paths aren't normalized identically. Consider using Path comparison for robustness.

🔧 Suggested improvement for path comparison
                         # Check if path is under this mount point and if it's a network FS
-                        if path_str.startswith(mount_point):
+                        try:
+                            path_resolved = Path(path_str).resolve()
+                            mount_resolved = Path(mount_point).resolve()
+                            if path_resolved == mount_resolved or mount_resolved in path_resolved.parents:
+                                if fs_type in ("nfs", "nfs4", "cifs", "smbfs", "fuse.sshfs"):
+                                    return True
+                        except (ValueError, OSError):
+                            # Path comparison failed, continue checking other mounts
+                            if path_str.startswith(mount_point):
                                 if fs_type in ("nfs", "nfs4", "cifs", "smbfs", "fuse.sshfs"):
                                     return True

Comment on lines +173 to +190
**IMPORTANT - XML Fallback:**
If the `feature_create_bulk` tool is unavailable or fails, output features in this XML format as a backup:

```xml
<features_to_create>
[
{
"category": "functional",
"name": "Feature name",
"description": "Description",
"steps": ["Step 1", "Step 2"]
}
]
</features_to_create>
```

The system will parse this XML and create features automatically.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Malformed XML/JSON hybrid format in fallback.

The XML fallback format is incorrectly structured. Lines 176-187 show XML tags (<features_to_create>) wrapping JSON array syntax ([{ ... }]), which is neither valid XML nor proper JSON. This hybrid format will likely fail to parse correctly.

If the system expects XML, the content should use XML elements:

<features_to_create>
  <feature>
    <category>functional</category>
    <name>Feature name</name>
    <description>Description</description>
    <steps>
      <step>Step 1</step>
      <step>Step 2</step>
    </steps>
  </feature>
</features_to_create>

If the system expects JSON wrapped in XML tags for extraction, clarify this in the documentation and explain why the XML wrapper is needed.

If JSON alone is sufficient, remove the XML tags and document it as a JSON fallback instead.

📋 Recommended fix using proper XML structure
 **IMPORTANT - XML Fallback:**
 If the `feature_create_bulk` tool is unavailable or fails, output features in this XML format as a backup:
 
 ```xml
 <features_to_create>
-[
-  {
-    "category": "functional",
-    "name": "Feature name",
-    "description": "Description",
-    "steps": ["Step 1", "Step 2"]
-  }
-]
+  <feature>
+    <category>functional</category>
+    <name>Feature name</name>
+    <description>Description</description>
+    <steps>
+      <step>Step 1</step>
+      <step>Step 2</step>
+    </steps>
+  </feature>
 </features_to_create>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In @.claude/commands/expand-project.md around lines 173 - 190, The fallback
currently embeds a JSON array inside an XML tag (<features_to_create>), which is
invalid; replace the hybrid with a correctly structured XML payload by
converting the JSON objects into elements (with child tags ,
, , and a container with multiple entries)
inside <features_to_create>, or—if the system expects raw JSON—remove the
<features_to_create> wrapper and document the fallback as JSON; update the
documentation text to explicitly state which format the parser expects and show
the corresponding valid example using either the <features_to_create> XML
structure or a plain JSON array.


</details>

<!-- fingerprinting:phantom:triton:puma -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +182 to +205
# IMPORTANT: Do NOT clear stuck features in parallel mode!
# The orchestrator manages feature claiming atomically.
# Clearing here causes race conditions where features are marked in_progress
# by the orchestrator but immediately cleared by the agent subprocess on startup.
#
# For single-agent mode or manual runs, clearing is still safe because
# there's only one agent at a time and it happens before claiming any features.
#
# Only clear if we're NOT in a parallel orchestrator context
# (detected by checking if this agent is a subprocess spawned by orchestrator)
try:
import psutil
parent_process = psutil.Process().parent()
parent_name = parent_process.name() if parent_process else ""

# Only clear if parent is NOT python (i.e., we're running manually, not from orchestrator)
if "python" not in parent_name.lower():
clear_stuck_features(project_dir)
except (ImportError, ModuleNotFoundError):
# psutil not available - assume single-agent mode and clear
clear_stuck_features(project_dir)
except Exception:
# If parent process check fails, err on the safe side and clear
clear_stuck_features(project_dir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find and read agent.py around lines 182-205
fd -type f -name "agent.py" | head -5

Repository: getworken/autocoder-coderabbit

Length of output: 244


🏁 Script executed:

# Fix the fd command syntax
fd --type file -name "agent.py"

Repository: getworken/autocoder-coderabbit

Length of output: 304


🏁 Script executed:

# Simple search for agent.py
find . -name "agent.py" -type f

Repository: getworken/autocoder-coderabbit

Length of output: 110


🏁 Script executed:

# Read agent.py around lines 182-205
cat -n agent.py | sed -n '175,215p'

Repository: getworken/autocoder-coderabbit

Length of output: 2177


🏁 Script executed:

# Check if psutil is in requirements/dependencies
find . -name "requirements*.txt" -o -name "setup.py" -o -name "pyproject.toml" -o -name "poetry.lock" | head -10

Repository: getworken/autocoder-coderabbit

Length of output: 109


🏁 Script executed:

# Check requirements.txt and pyproject.toml for psutil
echo "=== requirements.txt ===" && cat requirements.txt && echo -e "\n=== pyproject.toml ===" && cat pyproject.toml

Repository: getworken/autocoder-coderabbit

Length of output: 1220


🏁 Script executed:

# Find clear_stuck_features function definition
rg -n "def clear_stuck_features" .

Repository: getworken/autocoder-coderabbit

Length of output: 56


🏁 Script executed:

# Search more broadly for clear_stuck_features
rg "clear_stuck_features" -n

Repository: getworken/autocoder-coderabbit

Length of output: 280


🏁 Script executed:

# Check the imports in agent.py to see where clear_stuck_features comes from
cat -n agent.py | sed -n '1,50p'

Repository: getworken/autocoder-coderabbit

Length of output: 1736


🏁 Script executed:

# Find and examine clear_stuck_features in progress.py
find . -name "progress.py" -type f

Repository: getworken/autocoder-coderabbit

Length of output: 87


🏁 Script executed:

# Read progress.py and find the clear_stuck_features function
rg -n "def clear_stuck_features" progress.py -A 30

Repository: getworken/autocoder-coderabbit

Length of output: 56


🏁 Script executed:

# Search for clear_stuck_features in progress.py with broader context
rg "clear_stuck_features" progress.py -A 20 -B 2

Repository: getworken/autocoder-coderabbit

Length of output: 56


🏁 Script executed:

# List all contents of progress.py to find the function
cat progress.py | head -200

Repository: getworken/autocoder-coderabbit

Length of output: 6644


🏁 Script executed:

# Search for clear_stuck_features in the entire progress.py file
wc -l progress.py && echo "---" && rg -n "clear" progress.py

Repository: getworken/autocoder-coderabbit

Length of output: 93


🏁 Script executed:

# Check the full progress.py file to see if clear_stuck_features is defined
cat progress.py | tail -150

Repository: getworken/autocoder-coderabbit

Length of output: 5709


🏁 Script executed:

# Search entire codebase for clear_stuck_features definition
rg "def clear_stuck_features" .

Repository: getworken/autocoder-coderabbit

Length of output: 56


Skip clearing stuck features when orchestrator detection fails.

Both the except (ImportError, ModuleNotFoundError) and except Exception blocks fall back to calling clear_stuck_features(), which reintroduces the parallel-mode race condition the code explicitly guards against. While psutil is a required dependency (so ImportError is unlikely), the general Exception handler poses a real risk: if parent process detection fails for any reason, clearing still proceeds. Safer approach: only clear when detection succeeds and confirms we're not in parallel mode; skip clearing otherwise.

🛠️ Suggested safer fallback
-    try:
-        import psutil
-        parent_process = psutil.Process().parent()
-        parent_name = parent_process.name() if parent_process else ""
-
-        # Only clear if parent is NOT python (i.e., we're running manually, not from orchestrator)
-        if "python" not in parent_name.lower():
-            clear_stuck_features(project_dir)
-    except (ImportError, ModuleNotFoundError):
-        # psutil not available - assume single-agent mode and clear
-        clear_stuck_features(project_dir)
-    except Exception:
-        # If parent process check fails, err on the safe side and clear
-        clear_stuck_features(project_dir)
+    should_clear = False
+    try:
+        import psutil
+        parent_process = psutil.Process().parent()
+        parent_name = parent_process.name() if parent_process else ""
+
+        # Only clear if parent is NOT python (i.e., running manually, not from orchestrator)
+        should_clear = "python" not in parent_name.lower()
+    except (ImportError, ModuleNotFoundError):
+        logger.warning("psutil not available; skipping clear_stuck_features to avoid parallel-mode races")
+    except Exception:
+        logger.warning("Unable to determine parent process; skipping clear_stuck_features", exc_info=True)
+
+    if should_clear:
+        clear_stuck_features(project_dir)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# IMPORTANT: Do NOT clear stuck features in parallel mode!
# The orchestrator manages feature claiming atomically.
# Clearing here causes race conditions where features are marked in_progress
# by the orchestrator but immediately cleared by the agent subprocess on startup.
#
# For single-agent mode or manual runs, clearing is still safe because
# there's only one agent at a time and it happens before claiming any features.
#
# Only clear if we're NOT in a parallel orchestrator context
# (detected by checking if this agent is a subprocess spawned by orchestrator)
try:
import psutil
parent_process = psutil.Process().parent()
parent_name = parent_process.name() if parent_process else ""
# Only clear if parent is NOT python (i.e., we're running manually, not from orchestrator)
if "python" not in parent_name.lower():
clear_stuck_features(project_dir)
except (ImportError, ModuleNotFoundError):
# psutil not available - assume single-agent mode and clear
clear_stuck_features(project_dir)
except Exception:
# If parent process check fails, err on the safe side and clear
clear_stuck_features(project_dir)
# IMPORTANT: Do NOT clear stuck features in parallel mode!
# The orchestrator manages feature claiming atomically.
# Clearing here causes race conditions where features are marked in_progress
# by the orchestrator but immediately cleared by the agent subprocess on startup.
#
# For single-agent mode or manual runs, clearing is still safe because
# there's only one agent at a time and it happens before claiming any features.
#
# Only clear if we're NOT in a parallel orchestrator context
# (detected by checking if this agent is a subprocess spawned by orchestrator)
should_clear = False
try:
import psutil
parent_process = psutil.Process().parent()
parent_name = parent_process.name() if parent_process else ""
# Only clear if parent is NOT python (i.e., running manually, not from orchestrator)
should_clear = "python" not in parent_name.lower()
except (ImportError, ModuleNotFoundError):
logger.warning("psutil not available; skipping clear_stuck_features to avoid parallel-mode races")
except Exception:
logger.warning("Unable to determine parent process; skipping clear_stuck_features", exc_info=True)
if should_clear:
clear_stuck_features(project_dir)
🤖 Prompt for AI Agents
In `@agent.py` around lines 182 - 205, The current parent-process detection calls
clear_stuck_features() in both ImportError and general Exception handlers,
reintroducing the parallel-mode race; change the logic so
clear_stuck_features(project_dir) is only invoked when psutil import and parent
detection succeed and the parent_name check confirms we're not running under the
orchestrator (i.e., "python" not in parent_name.lower()); on ImportError or any
exception during detection, do NOT call clear_stuck_features — instead skip
clearing (optionally log a warning) so we avoid unsafe clearing when detection
fails; update the block around the psutil import/parent_process/name check and
the try/except handlers (references: psutil, parent_process, parent_name,
clear_stuck_features) accordingly.

Comment on lines +47 to +51
"CLAUDE_CODE_MAX_OUTPUT_TOKENS", # Max output tokens (default 32000, GLM 4.7 supports 131072)
]

# Default max output tokens for GLM 4.7 compatibility (131k output limit)
DEFAULT_MAX_OUTPUT_TOKENS = "131072"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's find and explore the client.py file
find . -name "client.py" -type f | head -20

Repository: getworken/autocoder-coderabbit

Length of output: 85


🏁 Script executed:

#!/bin/bash
# List the structure to understand the repository
ls -la | head -30

Repository: getworken/autocoder-coderabbit

Length of output: 1927


🏁 Script executed:

#!/bin/bash
# Look for the specific lines mentioned in the review
find . -name "client.py" -type f -exec wc -l {} \;

Repository: getworken/autocoder-coderabbit

Length of output: 89


🌐 Web query:

Anthropic Claude API maximum output tokens limit specifications

💡 Result:

In the Anthropic Messages API, the max_tokens parameter sets the maximum number of output tokens Claude may generate, and the maximum allowed value depends on the model. (docs.anthropic.com)

Current Anthropic API max output limits (per response)

  • Claude Sonnet 4 (claude-sonnet-4-20250514): 64,000 tokens (docs.anthropic.com)
  • Claude Sonnet 3.7 (claude-3-7-sonnet-20250219): 64,000 tokens (optionally 128,000 with beta header output-128k-2025-02-19) (docs.anthropic.com)
  • Claude Opus 4 / 4.1 (claude-opus-4-20250514, claude-opus-4-1-20250805): 32,000 tokens (docs.anthropic.com)
  • Claude 3.5 Sonnet (claude-3-5-sonnet-20241022, claude-3-5-sonnet-20240620): 8,192 tokens (docs.anthropic.com)
  • Claude 3.5 Haiku (claude-3-5-haiku-20241022): 8,192 tokens (docs.anthropic.com)
  • Claude 3 Haiku (claude-3-haiku-20240307): 4,096 tokens (docs.anthropic.com)

Note: If you enable extended thinking, those thinking tokens count toward your max_tokens budget. (docs.anthropic.com)

Citations:


🌐 Web query:

GLM 4.7 maximum output tokens limit specifications

💡 Result:

GLM‑4.7’s maximum output (completion) length is 128,000 tokens (i.e., you can generate up to 128K output tokens in a single response). [1]

Notes:

  • GLM‑4.7’s context length is 200K tokens, so in typical chat usage your prompt tokens + generated output tokens must fit within the 200K context window. [1]
  • Some third‑party hosts/listings show a slightly different cap (e.g., “Max Tokens 131K”), which can reflect provider-specific serving limits rather than the base model spec. [2]

Sources: [1] Z.AI developer docs (GLM‑4.7 overview) • [2] SiliconFlow model card/listing for GLM‑4.7


🏁 Script executed:

#!/bin/bash
cat -n client.py | sed -n '40,60p'

Repository: getworken/autocoder-coderabbit

Length of output: 1150


🏁 Script executed:

#!/bin/bash
cat -n client.py | sed -n '295,315p'

Repository: getworken/autocoder-coderabbit

Length of output: 1074


🏁 Script executed:

#!/bin/bash
# Check if ANTHROPIC_BASE_URL is used elsewhere in the codebase
rg "ANTHROPIC_BASE_URL" --context 2

Repository: getworken/autocoder-coderabbit

Length of output: 6163


🏁 Script executed:

#!/bin/bash
# Search for where 131072 is used or documented
rg "131072|131k|131K" -i

Repository: getworken/autocoder-coderabbit

Length of output: 1298


🏁 Script executed:

#!/bin/bash
# Check if there's any documentation about this value
grep -r "131" . --include="*.md" --include="*.txt" | grep -i token

Repository: getworken/autocoder-coderabbit

Length of output: 56


Guard the 131k default to alternative APIs only.

Setting CLAUDE_CODE_MAX_OUTPUT_TOKENS to 131,072 unconditionally will exceed Anthropic model limits (which range from 4K to 64K) and cause request failures when using the standard Anthropic API. This default is only appropriate for alternative APIs like GLM.

Modify lines 303-305 to set the default only when ANTHROPIC_BASE_URL is configured:

🔧 Proposed fix
-    # Set default max output tokens for GLM 4.7 compatibility if not already set
-    if "CLAUDE_CODE_MAX_OUTPUT_TOKENS" not in sdk_env:
-        sdk_env["CLAUDE_CODE_MAX_OUTPUT_TOKENS"] = DEFAULT_MAX_OUTPUT_TOKENS
+    # Set default max output tokens only for alternative APIs (e.g., GLM)
+    if "CLAUDE_CODE_MAX_OUTPUT_TOKENS" not in sdk_env and os.getenv("ANTHROPIC_BASE_URL"):
+        sdk_env["CLAUDE_CODE_MAX_OUTPUT_TOKENS"] = DEFAULT_MAX_OUTPUT_TOKENS

Apply the same fix to server/services/spec_chat_session.py, server/services/expand_chat_session.py, and server/services/assistant_chat_session.py (lines 303-305 in each file).

🤖 Prompt for AI Agents
In `@client.py` around lines 47 - 51, The unconditional DEFAULT_MAX_OUTPUT_TOKENS
= "131072" should be set only when an alternative Anthropic-compatible API is
configured; change the assignment to check os.getenv("ANTHROPIC_BASE_URL") (or
equivalent config) and only set DEFAULT_MAX_OUTPUT_TOKENS to "131072" when
ANTHROPIC_BASE_URL is present, otherwise set a safer default (or leave
unset/None) to avoid exceeding standard Anthropic model limits; apply this same
guarded assignment change for DEFAULT_MAX_OUTPUT_TOKENS in client.py and the
same lines in server/services/spec_chat_session.py,
server/services/expand_chat_session.py, and
server/services/assistant_chat_session.py, referencing the
DEFAULT_MAX_OUTPUT_TOKENS constant and the ANTHROPIC_BASE_URL environment/config
variable.

Comment on lines +31 to +56
# Available models with display names (Claude models)
# To add a new model: add an entry here with {"id": "model-id", "name": "Display Name"}
AVAILABLE_MODELS = [
CLAUDE_MODELS = [
{"id": "claude-opus-4-5-20251101", "name": "Claude Opus 4.5"},
{"id": "claude-sonnet-4-5-20250929", "name": "Claude Sonnet 4.5"},
]

# Common Ollama models for local inference
OLLAMA_MODELS = [
{"id": "llama3.3:70b", "name": "Llama 3.3 70B"},
{"id": "llama3.2:latest", "name": "Llama 3.2"},
{"id": "codellama:34b", "name": "Code Llama 34B"},
{"id": "deepseek-coder:33b", "name": "DeepSeek Coder 33B"},
{"id": "qwen2.5:72b", "name": "Qwen 2.5 72B"},
{"id": "mistral:latest", "name": "Mistral"},
]

# Default to Claude models (will be overridden if Ollama is detected)
AVAILABLE_MODELS = CLAUDE_MODELS

# List of valid model IDs (derived from AVAILABLE_MODELS)
VALID_MODELS = [m["id"] for m in AVAILABLE_MODELS]
VALID_MODELS = [m["id"] for m in CLAUDE_MODELS]

# Default model and settings
DEFAULT_MODEL = "claude-opus-4-5-20251101"
DEFAULT_OLLAMA_MODEL = "llama3.3:70b"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

VALID_MODELS won’t reflect Ollama availability.

VALID_MODELS is derived from CLAUDE_MODELS, but AVAILABLE_MODELS is designed to switch to Ollama when detected. That means validators using VALID_MODELS will still reject Ollama IDs. Please derive VALID_MODELS from the active/combined model set, or ensure it’s updated whenever AVAILABLE_MODELS changes.

🔧 Suggested fix
-# List of valid model IDs (derived from AVAILABLE_MODELS)
-VALID_MODELS = [m["id"] for m in CLAUDE_MODELS]
+# List of valid model IDs (derived from available model catalogs)
+VALID_MODELS = [m["id"] for m in (CLAUDE_MODELS + OLLAMA_MODELS)]
🤖 Prompt for AI Agents
In `@registry.py` around lines 31 - 56, VALID_MODELS is currently built from
CLAUDE_MODELS and won’t include Ollama IDs when AVAILABLE_MODELS is swapped;
update the code so VALID_MODELS is derived from AVAILABLE_MODELS (or recomputed
whenever AVAILABLE_MODELS changes) instead of CLAUDE_MODELS. Locate the
constants VALID_MODELS and AVAILABLE_MODELS and change the list comprehension
that builds VALID_MODELS to use AVAILABLE_MODELS (or add a small helper/function
to refresh VALID_MODELS after detecting Ollama) so validators accept the actual
active model IDs; ensure DEFAULT_OLLAMA_MODEL and DEFAULT_MODEL remain
unchanged.

Comment on lines +68 to +85
# Redact sensitive data before logging to prevent secret leakage
# Use deterministic hash for identification without exposing content
command_hash = hashlib.sha256(command.encode('utf-8')).hexdigest()[:16]
reason_hash = hashlib.sha256(reason.encode('utf-8')).hexdigest()[:16]

# Create redacted preview (first 20 + last 20 chars with mask in between)
def redact_string(s: str, max_preview: int = 20) -> str:
if len(s) <= max_preview * 2:
return s[:max_preview] + "..." if len(s) > max_preview else s
return f"{s[:max_preview]}...{s[-max_preview:]}"

command_preview = redact_string(command, 20)
reason_preview = redact_string(reason, 20)

logger.info(
f"[SECURITY] Command denied (hash: {command_hash}): {command_preview} "
f"Reason (hash: {reason_hash}): {reason_preview}"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging denied command previews to prevent secret leakage.

Even with hashing, Line 73+ logs partial command/reason strings. Secrets can easily appear in the first 20 characters (e.g., tokens in headers). Prefer logging only hashes/lengths or fully masked strings. Also strip trailing whitespace on blank lines in this block (CI W293 at Line 67/72/78/81).

🔒 Suggested fix
-    # Create redacted preview (first 20 + last 20 chars with mask in between)
-    def redact_string(s: str, max_preview: int = 20) -> str:
-        if len(s) <= max_preview * 2:
-            return s[:max_preview] + "..." if len(s) > max_preview else s
-        return f"{s[:max_preview]}...{s[-max_preview:]}"
-    
-    command_preview = redact_string(command, 20)
-    reason_preview = redact_string(reason, 20)
-    
-    logger.info(
-        f"[SECURITY] Command denied (hash: {command_hash}): {command_preview} "
-        f"Reason (hash: {reason_hash}): {reason_preview}"
-    )
+    logger.info(
+        "[SECURITY] Command denied (hash: %s, len: %d). Reason hash: %s, len: %d",
+        command_hash, len(command), reason_hash, len(reason)
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Redact sensitive data before logging to prevent secret leakage
# Use deterministic hash for identification without exposing content
command_hash = hashlib.sha256(command.encode('utf-8')).hexdigest()[:16]
reason_hash = hashlib.sha256(reason.encode('utf-8')).hexdigest()[:16]
# Create redacted preview (first 20 + last 20 chars with mask in between)
def redact_string(s: str, max_preview: int = 20) -> str:
if len(s) <= max_preview * 2:
return s[:max_preview] + "..." if len(s) > max_preview else s
return f"{s[:max_preview]}...{s[-max_preview:]}"
command_preview = redact_string(command, 20)
reason_preview = redact_string(reason, 20)
logger.info(
f"[SECURITY] Command denied (hash: {command_hash}): {command_preview} "
f"Reason (hash: {reason_hash}): {reason_preview}"
)
# Redact sensitive data before logging to prevent secret leakage
# Use deterministic hash for identification without exposing content
command_hash = hashlib.sha256(command.encode('utf-8')).hexdigest()[:16]
reason_hash = hashlib.sha256(reason.encode('utf-8')).hexdigest()[:16]
logger.info(
"[SECURITY] Command denied (hash: %s, len: %d). Reason hash: %s, len: %d",
command_hash, len(command), reason_hash, len(reason)
)
🧰 Tools
🪛 GitHub Actions: CI

[error] 72-72: W293: Blank line contains whitespace


[error] 78-78: W293: Blank line contains whitespace


[error] 81-81: W293: Blank line contains whitespace

🤖 Prompt for AI Agents
In `@security.py` around lines 68 - 85, Replace the partial previews in the
logging block so no parts of command/reason are logged: stop calling
redact_string for previews and instead log only the deterministic hashes
(command_hash, reason_hash) and the lengths (len(command), len(reason)) or a
fixed masked string; update the logger.info call to include only those
hashes/lengths or fully masked placeholders and remove any trailing whitespace
on blank lines in this block (affecting the lines around
command_hash/reason_hash, redact_string, and logger.info) to fix CI W293.

Comment on lines +710 to +734
@router.post("/{name}/knowledge", response_model=KnowledgeFileContent)
async def upload_knowledge_file(name: str, file: KnowledgeFileUpload):
"""Upload a knowledge file to a project."""
_init_imports()
_, _, get_project_path, _, _ = _get_registry_functions()

name = validate_project_name(name)
project_dir = get_project_path(name)

if not project_dir:
raise HTTPException(status_code=404, detail=f"Project '{name}' not found")

if not project_dir.exists():
raise HTTPException(status_code=404, detail="Project directory not found")

knowledge_dir = get_knowledge_dir(project_dir)
knowledge_dir.mkdir(parents=True, exist_ok=True)

filepath = knowledge_dir / file.filename

try:
filepath.write_text(file.content, encoding="utf-8")
return KnowledgeFileContent(name=file.filename, content=file.content)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Failed to write file: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing filename validation in upload endpoint.

The upload_knowledge_file endpoint doesn't validate file.filename before using it to construct the file path. While the KnowledgeFileUpload schema may have validation, the endpoint should defensively validate the filename to prevent path traversal via specially crafted filenames.

Proposed fix
 `@router.post`("/{name}/knowledge", response_model=KnowledgeFileContent)
 async def upload_knowledge_file(name: str, file: KnowledgeFileUpload):
     """Upload a knowledge file to a project."""
     _init_imports()
     _, _, get_project_path, _, _ = _get_registry_functions()

     name = validate_project_name(name)
     project_dir = get_project_path(name)

     if not project_dir:
         raise HTTPException(status_code=404, detail=f"Project '{name}' not found")

     if not project_dir.exists():
         raise HTTPException(status_code=404, detail="Project directory not found")

+    # Validate filename (prevent path traversal)
+    if not re.match(r'^[a-zA-Z0-9_\-\.]+\.md$', file.filename):
+        raise HTTPException(status_code=400, detail="Invalid filename. Must be alphanumeric with .md extension")
+
     knowledge_dir = get_knowledge_dir(project_dir)
     knowledge_dir.mkdir(parents=True, exist_ok=True)

     filepath = knowledge_dir / file.filename
🤖 Prompt for AI Agents
In `@server/routers/projects.py` around lines 710 - 734, The upload_knowledge_file
endpoint uses file.filename directly to build filepath and is vulnerable to path
traversal; validate and sanitize the filename before writing: ensure filename is
non-empty, contains no path separators or "..", and is not absolute (you can get
a safe name via pathlib.Path(file.filename).name or os.path.basename and compare
to the original to detect tampering), reject names with null bytes or dangerous
characters, then construct filepath = knowledge_dir / safe_name and verify
filepath.resolve().is_relative_to(knowledge_dir.resolve()) (or compare parents)
before writing; on invalid names return HTTPException(status_code=400) and keep
the rest of upload_knowledge_file, KnowledgeFileUpload, get_knowledge_dir
references intact.

Comment on lines +379 to +417
# Detect successful feature_create_bulk tool calls
# Handle both ToolResult and ToolResultBlock naming conventions
elif block_type in ("ToolResultBlock", "ToolResult"):
# Try to get tool name from tool_use_id correlation or direct attribute
tool_use_id = getattr(block, "tool_use_id", None)
tool_name = tool_use_map.get(tool_use_id, "") or getattr(block, "tool_name", "")
if "feature_create_bulk" in tool_name:
mcp_tool_succeeded = True
logger.info("Detected successful feature_create_bulk MCP tool call")

# Extract created features from tool result
tool_content = getattr(block, "content", [])
if tool_content:
for content_block in tool_content:
if hasattr(content_block, "text"):
try:
result_data = json.loads(content_block.text)
created_features = result_data.get("created_features", [])

if created_features:
self.features_created += len(created_features)
# Safely extract feature IDs, filtering out any without valid IDs
self.created_feature_ids.extend(
[f.get("id") for f in created_features if f.get("id") is not None]
)

yield {
"type": "features_created",
"count": len(created_features),
"features": created_features,
"source": "mcp" # Tag source for debugging
}

logger.info(f"Created {len(created_features)} features for {self.project_name} (via MCP)")
except (json.JSONDecodeError, AttributeError) as e:
logger.warning(f"Failed to parse MCP tool result: {e}")

# Only parse XML if MCP tool wasn't used (fallback mechanism)
if not mcp_tool_succeeded:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n server/services/expand_chat_session.py | sed -n '370,430p'

Repository: getworken/autocoder-coderabbit

Length of output: 4183


🏁 Script executed:

head -50 server/services/expand_chat_session.py

Repository: getworken/autocoder-coderabbit

Length of output: 1417


🏁 Script executed:

rg -A 3 -B 3 "is_error|ToolResult.*error" server/services/ 2>/dev/null | head -50

Repository: getworken/autocoder-coderabbit

Length of output: 991


🏁 Script executed:

rg "getattr.*error" server/services/spec_chat_session.py -B 2 -A 2

Repository: getworken/autocoder-coderabbit

Length of output: 579


🏁 Script executed:

rg -B 5 "mcp_tool_succeeded" server/services/expand_chat_session.py | head -40

Repository: getworken/autocoder-coderabbit

Length of output: 1180


Gate MCP success on error checking and actual feature extraction to preserve XML fallback.

mcp_tool_succeeded is set to True immediately upon seeing a feature_create_bulk ToolResult block (line 386), before validating that the result is not an error. If the ToolResult has is_error=True or content parsing fails, XML parsing is skipped and features won't be created via either path.

Move the mcp_tool_succeeded = True assignment to occur only after successfully extracting created_features, and check is_error before processing. This preserves the XML fallback mechanism when the MCP tool fails or returns an error response.

Suggested fix pattern
-                    elif block_type in ("ToolResultBlock", "ToolResult"):
+                    elif block_type in ("ToolResultBlock", "ToolResult"):
                         # Try to get tool name from tool_use_id correlation or direct attribute
                         tool_use_id = getattr(block, "tool_use_id", None)
                         tool_name = tool_use_map.get(tool_use_id, "") or getattr(block, "tool_name", "")
-                        if "feature_create_bulk" in tool_name:
-                            mcp_tool_succeeded = True
-                            logger.info("Detected successful feature_create_bulk MCP tool call")
+                        is_error = getattr(block, "is_error", False) or bool(getattr(block, "error", None))
+                        if "feature_create_bulk" in tool_name and not is_error:
+                            logger.info("Detected feature_create_bulk MCP tool call")
 
                             # Extract created features from tool result
                             tool_content = getattr(block, "content", [])
                             if tool_content:
+                                parsed_any = False
                                 for content_block in tool_content:
                                     if hasattr(content_block, "text"):
                                         try:
                                             result_data = json.loads(content_block.text)
+                                            if result_data.get("error"):
+                                                continue
                                             created_features = result_data.get("created_features", [])
 
                                             if created_features:
+                                                parsed_any = True
                                                 self.features_created += len(created_features)
                                                 # Safely extract feature IDs, filtering out any without valid IDs
                                                 self.created_feature_ids.extend(
                                                     [f.get("id") for f in created_features if f.get("id") is not None]
                                                 )
 
                                                 yield {
                                                     "type": "features_created",
                                                     "count": len(created_features),
                                                     "features": created_features,
                                                     "source": "mcp"  # Tag source for debugging
                                                 }
 
                                                 logger.info(f"Created {len(created_features)} features for {self.project_name} (via MCP)")
                                         except (json.JSONDecodeError, AttributeError) as e:
                                             logger.warning(f"Failed to parse MCP tool result: {e}")
+                                if parsed_any:
+                                    mcp_tool_succeeded = True
🤖 Prompt for AI Agents
In `@server/services/expand_chat_session.py` around lines 379 - 417, The code sets
mcp_tool_succeeded = True as soon as a ToolResult/ToolResultBlock with
"feature_create_bulk" is seen, preventing the XML fallback even when the tool
errored or parsing failed; change the logic in the ToolResult/ToolResultBlock
branch to first check getattr(block, "is_error", False) and skip processing if
true, then attempt JSON parsing of content and extract created_features, and
only after successfully extracting non-empty created_features (and appending ids
to self.created_feature_ids and incrementing self.features_created) set
mcp_tool_succeeded = True; ensure any JSONDecodeError/AttributeError keeps
mcp_tool_succeeded False so the subsequent XML fallback runs.

Comment on lines +241 to +268
if not self.lock_file.exists():
return

try:
# Read lock file to verify it's ours
lock_content = self.lock_file.read_text().strip()

# Check if we own this lock
our_pid = self.pid
if our_pid is None:
# We don't have a running process, but lock exists
# This is unexpected - remove it anyway
self.lock_file.unlink(missing_ok=True)
logger.debug("Removed orphaned lock file (no running process)")
return

# Parse lock content
if ":" in lock_content:
lock_pid_str, _ = lock_content.split(":", 1)
lock_pid = int(lock_pid_str)
else:
lock_pid = int(lock_content)

# If lock PID matches our process, remove it
if lock_pid == our_pid:
self.lock_file.unlink(missing_ok=True)
logger.debug(f"Removed lock file for our process (PID {our_pid})")
else:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid removing a valid lock when no local process handle exists.
If the server restarts (or the manager loses its process handle), self.pid is None and the lock is removed unconditionally. That can drop a lock for a still‑running agent and allow duplicate processes. Consider verifying the lock PID’s liveness/command before removal.

🛠️ Proposed fix
-            # Check if we own this lock
-            our_pid = self.pid
-            if our_pid is None:
-                # We don't have a running process, but lock exists
-                # This is unexpected - remove it anyway
-                self.lock_file.unlink(missing_ok=True)
-                logger.debug("Removed orphaned lock file (no running process)")
-                return
-
-            # Parse lock content
-            if ":" in lock_content:
-                lock_pid_str, _ = lock_content.split(":", 1)
-                lock_pid = int(lock_pid_str)
-            else:
-                lock_pid = int(lock_content)
+            # Parse lock content
+            if ":" in lock_content:
+                lock_pid_str, _ = lock_content.split(":", 1)
+                lock_pid = int(lock_pid_str)
+            else:
+                lock_pid = int(lock_content)
+
+            # Check if we own this lock
+            our_pid = self.pid
+            if our_pid is None:
+                # No local process handle — only remove if the lock PID is stale
+                if psutil.pid_exists(lock_pid):
+                    try:
+                        proc = psutil.Process(lock_pid)
+                        cmdline = " ".join(proc.cmdline())
+                        if "autonomous_agent_demo.py" in cmdline:
+                            logger.debug(
+                                "Lock belongs to a running agent (PID %s); leaving intact",
+                                lock_pid,
+                            )
+                            return
+                    except (psutil.NoSuchProcess, psutil.AccessDenied):
+                        pass
+                self.lock_file.unlink(missing_ok=True)
+                logger.debug("Removed orphaned lock file (no local process)")
+                return
🤖 Prompt for AI Agents
In `@server/services/process_manager.py` around lines 241 - 268, The current logic
unconditionally deletes the lock when self.pid is None, which can remove a valid
lock owned by a still-running process; instead, when our_pid is None, read and
parse the lock file into lock_pid (as done below) and then verify liveness of
lock_pid (e.g., os.kill(lock_pid, 0) on Unix or psutil.pid_exists/psutil.Process
to check alive and optionally inspect cmdline) before removing it; only call
self.lock_file.unlink(...) if the lock_pid is not alive or the process
command/identity does not match the expected agent, otherwise leave the lock in
place and log a warning.

Comment on lines +75 to +123
def check_hook(command: str, should_block: bool) -> bool:
"""Check a single command against the security hook (helper function)."""
input_data = {"tool_name": "Bash", "tool_input": {"command": command}}
result = asyncio.run(bash_security_hook(input_data))
was_blocked = result.get("decision") == "block"

if was_blocked == should_block:
status = "PASS"
else:
status = "FAIL"
expected = "blocked" if should_block else "allowed"
actual = "blocked" if was_blocked else "allowed"
reason = result.get("reason", "")
print(f" {status}: {command!r}")
print(f" Expected: {expected}, Got: {actual}")
if reason:
print(f" Reason: {reason}")
return False

print(f" {status}: {command!r}")
return True


def test_extract_commands():
"""Test the command extraction logic."""
print("\nTesting command extraction:\n")
passed = 0
failed = 0

test_cases = [
("ls -la", ["ls"]),
("npm install && npm run build", ["npm", "npm"]),
("cat file.txt | grep pattern", ["cat", "grep"]),
("/usr/bin/node script.js", ["node"]),
("VAR=value ls", ["ls"]),
("git status || git init", ["git", "git"]),
]

for cmd, expected in test_cases:
result = extract_commands(cmd)
if result == expected:
print(f" PASS: {cmd!r} -> {result}")
passed += 1
else:
print(f" FAIL: {cmd!r}")
print(f" Expected: {expected}, Got: {result}")
failed += 1

return passed, failed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pytest will treat these tests as always passing.

The test_* functions only print and return counts; pytest ignores return values, so failures won’t fail CI. Add assertions (or raise) on failed to enforce the checks.

✅ Make pytest fail on test failures
 def check_hook(command: str, should_block: bool) -> bool:
@@
     print(f"  {status}: {command!r}")
     return True

+def _assert_no_failures(passed: int, failed: int, label: str) -> None:
+    assert failed == 0, f"{label}: {failed} failure(s)"
+
 def test_extract_commands():
@@
-    return passed, failed
+    _assert_no_failures(passed, failed, "extract_commands")
+    return passed, failed

Apply the same pattern to the rest of the test_* functions.

🧰 Tools
🪛 GitHub Actions: CI

[error] 78-78: W293: Blank line contains whitespace


[error] 81-81: W293: Blank line contains whitespace

🤖 Prompt for AI Agents
In `@tests/test_security.py` around lines 75 - 123, Tests like
test_extract_commands and helpers like check_hook currently only print and
return values so pytest treats them as passing; update each test_* function
(e.g., test_extract_commands and any other test_* in this file) to assert on
failures instead of returning counts by adding an assertion such as assert
failed == 0 (or assert passed == expected_count) at the end, and update
check_hook to raise an AssertionError or return a boolean used by an assert in
callers (e.g., assert check_hook(cmd, should_block) is True) so any mismatch
fails the test run; ensure each test uses pytest-style assertions rather than
print+return.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants