Skip to content

Agentic fix: cwd-aware paths, verification controls, lint cleanup, and targeted tests#476

Open
beknobloch wants to merge 6 commits intopromptdriven:mainfrom
beknobloch:agentic-fix-split
Open

Agentic fix: cwd-aware paths, verification controls, lint cleanup, and targeted tests#476
beknobloch wants to merge 6 commits intopromptdriven:mainfrom
beknobloch:agentic-fix-split

Conversation

@beknobloch
Copy link
Contributor

Summary

Make agentic fix path handling more robust: resolve llm_model.csv and error logs relative to the provided cwd, not process CWD.
Improve verification behavior in run_agentic_fix, including honoring PDD_AGENTIC_VERIFY=auto.
Fix stack-trace logging and clean up low-risk lint issues in agentic_fix.py.
Add/extend tests around cwd/path handling and LLM CSV discovery; temporarily skip the Google real-call test.

Test Results

  • Unit tests: PASS
  • Regression tests: PASS
  • Sync regression: PASS
  • PyLint result: 8.76 / 10

Manual Testing

N/A

Checklist

  • All tests pass
  • No temp files committed
  • No merge conflicts

…r log resolution

- Modified find_llm_csv_path to take an optional base_dir parameter for improved path resolution.
- Updated run_agentic_fix to resolve error log paths against the provided working directory.
- Added tests to verify the new behavior of find_llm_csv_path and error log resolution.
- Adjusted the logic for determining when verification is enabled, ensuring it defaults to true unless explicitly overridden.
- Refined the handling of present keys based on authentication methods for better clarity and functionality.
- Changed the logic to skip tests when the Google API key is not available, providing a clearer message for the absence of the key.
- Ensured that the test suite only runs when the necessary API key and CLI are present.
@gltanaka gltanaka requested a review from Copilot February 6, 2026 21:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the robustness of agentic_fix by making path resolution cwd-aware, adjusting verification/logging behavior, cleaning lint issues, and adding targeted tests for path discovery.

Changes:

  • Resolve llm_model.csv and relative error-log paths against the provided cwd (not process CWD), with new tests.
  • Add new agent-output “harvest/apply” parsing utilities (file blocks + optional TESTCMD) and extra diagnostics.
  • Update real-call tests to skip the Google provider case when the expected key isn’t available.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
tests/test_agentic_fix.py Adds tests for cwd-aware error-log resolution and csv discovery; adjusts Google real-call gating.
pdd/agentic_fix.py Implements cwd-aware csv/log paths, adds agent output harvesting utilities, changes provider detection and success/verification flow.
Comments suppressed due to low confidence (1)

pdd/agentic_fix.py:1

  • The public function signature removed the protect_tests kwarg. If downstream callers (or CLI flags) still pass this, it will now raise TypeError. If this option is intentionally deprecated, consider keeping it as an ignored kwarg (or with the old default) for backward compatibility, or provide a clear deprecation path before removing it.
from __future__ import annotations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +42 to +43
# When verification mode is "auto", we may run agent-supplied TESTCMD blocks (if emitted)
_AGENT_TESTCMD_ALLOWED = os.getenv("PDD_AGENTIC_AGENT_TESTCMD", "1") != "0"
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing an agent-supplied command via bash -lc is a high-risk code execution vector, and _AGENT_TESTCMD_ALLOWED currently defaults to enabled (\"1\"). Consider defaulting this to disabled (require explicit opt-in), and/or only allowing TESTCMD execution when verification is explicitly enabled by the user. If you keep this feature, strongly document the risk and consider adding allowlisting (e.g., only pytest ... forms) or running inside a sandboxed environment.

Suggested change
# When verification mode is "auto", we may run agent-supplied TESTCMD blocks (if emitted)
_AGENT_TESTCMD_ALLOWED = os.getenv("PDD_AGENTIC_AGENT_TESTCMD", "1") != "0"
# When verification mode is "auto", we may run agent-supplied TESTCMD blocks (if emitted).
# For safety, this is disabled by default and must be explicitly enabled via PDD_AGENTIC_AGENT_TESTCMD.
_AGENT_TESTCMD_ALLOWED = os.getenv("PDD_AGENTIC_AGENT_TESTCMD", "0") != "0"

Copilot uses AI. Check for mistakes.
Comment on lines +580 to 582
Execute an agent-supplied TESTCMD locally via bash -lc "<cmd>".
Return True on exit code 0, else False. Captures and previews output (verbose).
"""
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing an agent-supplied command via bash -lc is a high-risk code execution vector, and _AGENT_TESTCMD_ALLOWED currently defaults to enabled (\"1\"). Consider defaulting this to disabled (require explicit opt-in), and/or only allowing TESTCMD execution when verification is explicitly enabled by the user. If you keep this feature, strongly document the risk and consider adding allowlisting (e.g., only pytest ... forms) or running inside a sandboxed environment.

Suggested change
Execute an agent-supplied TESTCMD locally via bash -lc "<cmd>".
Return True on exit code 0, else False. Captures and previews output (verbose).
"""
Execute an agent-supplied TESTCMD locally via bash -lc "<cmd>".
This is potentially dangerous because it allows arbitrary shell execution.
Therefore, it is disabled by default and must be explicitly enabled by
setting the environment variable `_AGENT_TESTCMD_ALLOWED=1`.
Return True on exit code 0, else False. Captures and previews output (verbose).
"""
if os.environ.get("_AGENT_TESTCMD_ALLOWED", "0") != "1":
_info(
"[yellow]Skipping agent-supplied test command because "
"_AGENT_TESTCMD_ALLOWED is not set to '1'.[/yellow]"
)
return False

Copilot uses AI. Check for mistakes.
verify_cmd=verify_cmd or "No verification command provided.",
protect_tests="1",
)
harvest_file = Path("agentic_fix_harvest.txt")
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

harvest_file is created as a relative path, so it will be written to the process CWD rather than the provided cwd/project root. This conflicts with the PR’s cwd-aware goal and can also leak files into unexpected directories. Create the file under cwd (e.g., cwd / \"agentic_fix_harvest.txt\") and ensure any provider command that references it uses the correct absolute/relative path accordingly.

Suggested change
harvest_file = Path("agentic_fix_harvest.txt")
harvest_file = Path(cwd) / "agentic_fix_harvest.txt"

Copilot uses AI. Check for mistakes.
Comment on lines +1139 to +1144
# If verify_cmd arg is provided, it overrides env var and default
if verify_cmd is None:
verify_cmd = os.getenv("PDD_AGENTIC_VERIFY_CMD", None)

if verify_cmd is None:
verify_cmd = default_verify_cmd_for(get_language(os.path.splitext(code_path)[1]), unit_test_file)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_cmd is computed and passed into the prompt, but run_agentic_fix no longer performs any local verification step (nor honors PDD_AGENTIC_VERIFY / PDD_AGENTIC_VERIFY=auto behavior) before returning success. As written, any agent run that returns success=True will be treated as a successful fix, even if tests still fail. Re-introduce a post-run verification gate (using _verify_and_log and your desired env-policy) before returning (True, ...).

Copilot uses AI. Check for mistakes.
Comment on lines +1173 to 1181
# Delegate to shared agentic task runner (can be monkeypatched in tests)
success, output_text, cost, provider_used = run_agentic_task(
primary_instr,
cwd=working_dir,
verbose=verbose,
quiet=quiet,
verbose=_IS_VERBOSE,
quiet=_IS_QUIET,
label="agentic_fix",
max_retries=DEFAULT_MAX_RETRIES,
max_retries=1,
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_cmd is computed and passed into the prompt, but run_agentic_fix no longer performs any local verification step (nor honors PDD_AGENTIC_VERIFY / PDD_AGENTIC_VERIFY=auto behavior) before returning success. As written, any agent run that returns success=True will be treated as a successful fix, even if tests still fail. Re-introduce a post-run verification gate (using _verify_and_log and your desired env-policy) before returning (True, ...).

Copilot uses AI. Check for mistakes.
Comment on lines +1197 to +1201
if success:
msg = output_text or f"Agentic fix successful with {used_model}."
return True, msg, est_cost, used_model, changed_files

return False, (output_text or "All agents failed to produce a passing fix (no local fallback)."), est_cost, used_model, changed_files
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_cmd is computed and passed into the prompt, but run_agentic_fix no longer performs any local verification step (nor honors PDD_AGENTIC_VERIFY / PDD_AGENTIC_VERIFY=auto behavior) before returning success. As written, any agent run that returns success=True will be treated as a successful fix, even if tests still fail. Re-introduce a post-run verification gate (using _verify_and_log and your desired env-policy) before returning (True, ...).

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +117
lf.write(stack[-1] if stack else "N/A")
lf.write("\n")
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logs only the last frame of traceback.format_stack(), which is usually not enough to identify the call path that led to file creation. If the intent is diagnostic for Issue #186, write the full stack (or at least the last N frames via format_stack(limit=...)) so the log is actionable.

Suggested change
lf.write(stack[-1] if stack else "N/A")
lf.write("\n")
if stack:
lf.writelines(stack)
else:
lf.write("N/A\n")

Copilot uses AI. Check for mistakes.
Comment on lines +1016 to +1024
# Check CLI availability first (subscription auth), then API key.
# Use agentic_common._find_cli_binary so tests can monkeypatch CLI detection consistently.
has_cli_auth = provider == "anthropic" and bool(agentic_common._find_cli_binary("claude"))
has_api_key = os.getenv(api_key_name) or (provider == "google" and os.getenv("GEMINI_API_KEY"))
if has_cli_auth or has_api_key:
if has_cli_auth:
present_keys.append("claude-cli-auth")
if has_api_key:
present_keys.append(api_key_name or ("GEMINI_API_KEY" if provider == "google" else ""))
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explicitly disables Anthropic CLI-only auth by returning failure when only claude-cli-auth is present, even though earlier logic detects it as available. If supporting subscription/CLI auth is a goal, this behavior is surprising and prevents valid configurations from working. If tests need a deterministic failure mode, consider decoupling the runtime behavior from tests (e.g., feature flag/env toggle) rather than hard-disabling CLI-only auth in production.

Copilot uses AI. Check for mistakes.
Comment on lines +1030 to +1033
# Treat CLI-only Anthropic auth (claude-cli-auth) as *not* a configured API key
# for the purposes of this function. Tests expect a clear error when no API keys
# are present in the environment, even if a CLI might be installed.
has_real_api_keys = any(k and k != "claude-cli-auth" for k in present_keys)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explicitly disables Anthropic CLI-only auth by returning failure when only claude-cli-auth is present, even though earlier logic detects it as available. If supporting subscription/CLI auth is a goal, this behavior is surprising and prevents valid configurations from working. If tests need a deterministic failure mode, consider decoupling the runtime behavior from tests (e.g., feature flag/env toggle) rather than hard-disabling CLI-only auth in production.

Suggested change
# Treat CLI-only Anthropic auth (claude-cli-auth) as *not* a configured API key
# for the purposes of this function. Tests expect a clear error when no API keys
# are present in the environment, even if a CLI might be installed.
has_real_api_keys = any(k and k != "claude-cli-auth" for k in present_keys)
# By default, treat Anthropic CLI auth (claude-cli-auth) as a valid way to configure
# an agent. Tests that require a deterministic "no API keys" failure can enable
# stricter behavior via PDD_REQUIRE_AGENT_API_KEYS_ONLY.
strict_api_keys_only = os.getenv("PDD_REQUIRE_AGENT_API_KEYS_ONLY", "").lower() in ("1", "true", "yes")
has_real_api_keys = any(
k and (k != "claude-cli-auth" if strict_api_keys_only else True)
for k in present_keys
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant