ci(strict21): add canonical policy + preflight gate#15
ci(strict21): add canonical policy + preflight gate#15Prekzursil wants to merge 1 commit intomainfrom
Conversation
Co-authored-by: Codex <noreply@openai.com>
|
Unable to trigger custom agent "Code Reviewer". You have run out of credits 😔 |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Review Summary by QodoAdd strict-21 preflight validation with canonical policy
WalkthroughsDescription• Add strict-21 preflight validation script comparing canonical contexts • Define canonical branch protection policy with 24 required status checks • Implement GitHub Actions workflow for automated preflight checks on PRs • Validate branch protection and emitted check-runs against canonical contexts Diagramflowchart LR
A["Canonical Contexts List"] -->|"Compare against"| B["Branch Protection Policy"]
A -->|"Compare against"| C["Emitted Check-Runs"]
B -->|"Evaluate"| D["Preflight Script"]
C -->|"Evaluate"| D
D -->|"Generate"| E["JSON Report"]
D -->|"Generate"| F["Markdown Report"]
G["GitHub Actions Workflow"] -->|"Trigger"| D
File Changes1. scripts/strict21_preflight.py
|
Code Review by Qodo
1. Contexts don't match repo
|
|
|
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
| req = Request( | ||
| url, | ||
| headers={ | ||
| "Accept": "application/vnd.github+json", | ||
| "Authorization": f"Bearer {token}", | ||
| "X-GitHub-Api-Version": "2022-11-28", | ||
| "User-Agent": "reframe-strict21-preflight", | ||
| }, | ||
| method="GET", | ||
| ) |
Check failure
Code scanning / CodeQL
Partial server-side request forgery Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 12 hours ago
In general, to fix partial SSRF when part of a URL is user-controlled, you either (a) restrict the user to selecting from a set of known-good base URLs, or (b) strictly validate the supplied URL to ensure it targets only allowed hosts and protocols and does not point to internal or loopback addresses.
For this script, the best fix with minimal functional impact is to validate args.api_base before it is used in _api_get. We can introduce a small helper that parses and enforces that the host is one of a small whitelist of allowed GitHub API hosts (for example, api.github.com and api.github.com-equivalents that might be needed like github.com APIs, or optionally GitHub Enterprise patterns if you want to support them explicitly). Because we must not assume more about the environment than we see, a conservative approach is to:
- Parse
api_basewithurlparse. - Ensure the scheme is
https(and optionally allowhttponly if you truly need it; current code allows both, but tightening tohttpsis safer). - Ensure the hostname is one of a small allowed set, e.g.,
api.github.com.
This helper will be called in main() to normalize/validate args.api_base before passing it to _run_preflight, so that _api_get only ever receives a vetted base URL. This preserves existing behavior for normal usage (default https://api.github.com) while preventing redirection to attacker-chosen hosts.
Concretely:
- Add a new function, for example
_normalize_api_base(raw: str) -> str, near_api_get, that applies these checks and returns a clean base URL string. - In
main(), after parsingargsand before calling_run_preflight, replace uses ofargs.api_basewith a validatedapi_basevariable produced by_normalize_api_base. - Adjust
_run_preflightto accept and use this validatedapi_baseinstead of the raw, unvalidatedargs.api_base.
No new imports are required; urlparse is already imported and used.
| @@ -71,6 +71,26 @@ | ||
| return "inconclusive_permissions" if code in PERMISSION_HTTP_CODES else "api_error" | ||
|
|
||
|
|
||
| def _normalize_api_base(raw: str) -> str: | ||
| """ | ||
| Validate and normalize the GitHub API base URL. | ||
|
|
||
| This ensures that the base URL uses HTTPS and targets an expected GitHub API host, | ||
| preventing arbitrary redirection to attacker-controlled endpoints. | ||
| """ | ||
| parsed = urlparse(raw) | ||
| if parsed.scheme not in {"https"}: | ||
| raise ValueError(f"Unsupported API scheme in {raw!r}; only 'https' is allowed") | ||
| hostname = parsed.hostname or "" | ||
| # Restrict to the public GitHub API host. Extend this list explicitly if needed. | ||
| allowed_hosts = {"api.github.com"} | ||
| if hostname not in allowed_hosts: | ||
| raise ValueError(f"Unsupported API host in {raw!r}") | ||
| # Reconstruct a normalized base without path/query/fragment | ||
| netloc = parsed.netloc | ||
| return f"{parsed.scheme}://{netloc}" | ||
|
|
||
|
|
||
| def _api_get(api_base: str, repo: str, path: str, token: str) -> dict[str, Any]: | ||
| url = f"{api_base.rstrip('/')}/repos/{repo}/{path.lstrip('/')}" | ||
| parsed = urlparse(url) | ||
| @@ -228,15 +248,16 @@ | ||
| args: argparse.Namespace, | ||
| canonical: list[str], | ||
| token: str, | ||
| api_base: str, | ||
| ) -> tuple[PreflightResult, list[str], list[str]]: | ||
| try: | ||
| protection = _api_get(args.api_base, args.repo, f"branches/{args.branch}/protection", token) | ||
| ref_payload = _api_get(args.api_base, args.repo, f"commits/{args.ref}", token) | ||
| protection = _api_get(api_base, args.repo, f"branches/{args.branch}/protection", token) | ||
| ref_payload = _api_get(api_base, args.repo, f"commits/{args.ref}", token) | ||
| ref_sha = str(ref_payload.get("sha") or "").strip() or None | ||
| if ref_sha is None: | ||
| raise RuntimeError(f"Unable to resolve SHA for ref {args.ref!r}") | ||
| check_runs = _api_get(args.api_base, args.repo, f"commits/{ref_sha}/check-runs?per_page=100", token) | ||
| status_payload = _api_get(args.api_base, args.repo, f"commits/{ref_sha}/status", token) | ||
| check_runs = _api_get(api_base, args.repo, f"commits/{ref_sha}/check-runs?per_page=100", token) | ||
| status_payload = _api_get(api_base, args.repo, f"commits/{ref_sha}/status", token) | ||
|
|
||
| branch_required_checks = sorted((protection.get("required_status_checks") or {}).get("contexts") or []) | ||
| emitted_contexts = _collect_emitted_contexts(check_runs, status_payload) | ||
| @@ -264,10 +280,12 @@ | ||
| if not token: | ||
| result, branch_required_checks, emitted_contexts = _missing_token_result() | ||
| else: | ||
| api_base = _normalize_api_base(args.api_base) | ||
| result, branch_required_checks, emitted_contexts = _run_preflight( | ||
| args=args, | ||
| canonical=canonical, | ||
| token=token, | ||
| api_base=api_base, | ||
| ) | ||
|
|
||
| payload = { |
| "http_error": result.http_error, | ||
| } | ||
|
|
||
| out_json.parent.mkdir(parents=True, exist_ok=True) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
| } | ||
|
|
||
| out_json.parent.mkdir(parents=True, exist_ok=True) | ||
| out_md.parent.mkdir(parents=True, exist_ok=True) |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 12 hours ago
In general, to fix uncontrolled path usage you normalize the user-provided path and ensure it stays within a trusted base directory, or otherwise constrain it via an allow‑list or safe filename transformation. Here, we only need to write result files; we can safely constrain both out_json and out_md to reside under a base output directory (e.g. the current working directory) by resolving them and verifying they don’t escape that base.
The best fix with minimal behavior change is:
- Define a small helper, e.g.
_safe_subpath(base: Path, target: Path) -> Path, that:- Resolves
baseto an absolute path (e.g.base.resolve()). - Resolves
targetas if under that base (e.g.(base / target).resolve()). - Checks that the resolved target is within the base using
Path.is_relative_to(Py3.9+) or arelative_totry/except fallback. - Raises a
ValueErrorif the target escapes the base.
- Resolves
- In
main(), instead of usingPath(args.out_json)andPath(args.out_md)directly, compute them via this helper with a base ofPath.cwd()(or another appropriate base, but we must not assume more than what we see); e.g.:base_output = Path.cwd().resolve()out_json = _safe_subpath(base_output, Path(args.out_json))out_md = _safe_subpath(base_output, Path(args.out_md))
- Leave the later calls (
out_json.parent.mkdir(...),write_text(...)) unchanged; they now operate on validated paths.
This keeps functionality (the user can still specify nested paths under the working directory) while preventing directory traversal or writing outside the intended workspace.
| @@ -67,6 +67,26 @@ | ||
| return parser.parse_args() | ||
|
|
||
|
|
||
| def _safe_subpath(base: Path, target: Path) -> Path: | ||
| """ | ||
| Return a resolved path for ``target`` under ``base``, ensuring it does not escape ``base``. | ||
| """ | ||
| base_resolved = base.resolve() | ||
| candidate = (base_resolved / target).resolve() | ||
| try: | ||
| # Python 3.9+ provides is_relative_to; fall back to relative_to for older versions. | ||
| is_relative = candidate.is_relative_to(base_resolved) # type: ignore[attr-defined] | ||
| except AttributeError: | ||
| try: | ||
| candidate.relative_to(base_resolved) | ||
| is_relative = True | ||
| except ValueError: | ||
| is_relative = False | ||
| if not is_relative: | ||
| raise ValueError(f"Output path {candidate} escapes base directory {base_resolved}") | ||
| return candidate | ||
|
|
||
|
|
||
| def _classify_http_status(code: int) -> str: | ||
| return "inconclusive_permissions" if code in PERMISSION_HTTP_CODES else "api_error" | ||
|
|
||
| @@ -255,8 +275,9 @@ | ||
|
|
||
| def main() -> int: | ||
| args = _parse_args() | ||
| out_json = Path(args.out_json) | ||
| out_md = Path(args.out_md) | ||
| base_output = Path.cwd() | ||
| out_json = _safe_subpath(base_output, Path(args.out_json)) | ||
| out_md = _safe_subpath(base_output, Path(args.out_md)) | ||
| canonical = _canonical_contexts(args.canonical_contexts) | ||
|
|
||
| token = (os.environ.get("GITHUB_TOKEN") or "").strip() or (os.environ.get("GH_TOKEN") or "").strip() |
|
|
||
| out_json.parent.mkdir(parents=True, exist_ok=True) | ||
| out_md.parent.mkdir(parents=True, exist_ok=True) | ||
| out_json.write_text(json.dumps(payload, indent=2, sort_keys=True) + "\n", encoding="utf-8") |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 12 hours ago
In general, to fix uncontrolled path usage for CLI-provided output paths, we should normalize the path and then either (a) restrict it to lie within a safe root directory, or (b) otherwise sanitize it so that dangerous constructs like .. or absolute paths are rejected. For a reporting/preflight script, a common pattern is to write all outputs under a known working directory (for example, the current directory or a configurable base directory) and ensure user-specified paths cannot escape that root.
For this specific script, the minimal, functionality-preserving fix is to introduce a safe output root (e.g., the current working directory) and validate that --out-json and --out-md resolve to paths within that root after normalization. We can use Path.resolve() to normalize and obtain absolute paths, and then check that each resolved path has the root as one of its parents (or is equal to the root). If a user attempts to supply a path outside the root (/etc/passwd, ../../secret), we raise a clear error instead of writing. This keeps the intended ability to choose output filenames and subdirectories but defends against directory traversal and writes to unexpected locations.
Concretely, in main() in scripts/strict21_preflight.py, around lines 257–260 and before we use out_json and out_md, we will:
- Define a root directory, such as
output_root = Path.cwd().resolve(). - Convert the arguments to paths and resolve them:
out_json = Path(args.out_json).resolve(),out_md = Path(args.out_md).resolve(). - Verify
output_rootis a parent of both resolved paths (or equal). If not, raiseValueErrorwith an explanatory message. - Use the validated
out_jsonandout_mdas before (mkdir, write).
This change only affects the handling of the two output arguments and does not modify imports or other functionality.
| @@ -255,8 +255,13 @@ | ||
|
|
||
| def main() -> int: | ||
| args = _parse_args() | ||
| out_json = Path(args.out_json) | ||
| out_md = Path(args.out_md) | ||
| output_root = Path.cwd().resolve() | ||
| out_json = Path(args.out_json).resolve() | ||
| out_md = Path(args.out_md).resolve() | ||
| if output_root not in (out_json, *out_json.parents): | ||
| raise ValueError(f"Output JSON path must be under {output_root}, got {out_json}") | ||
| if output_root not in (out_md, *out_md.parents): | ||
| raise ValueError(f"Output markdown path must be under {output_root}, got {out_md}") | ||
| canonical = _canonical_contexts(args.canonical_contexts) | ||
|
|
||
| token = (os.environ.get("GITHUB_TOKEN") or "").strip() or (os.environ.get("GH_TOKEN") or "").strip() |
| out_json.parent.mkdir(parents=True, exist_ok=True) | ||
| out_md.parent.mkdir(parents=True, exist_ok=True) | ||
| out_json.write_text(json.dumps(payload, indent=2, sort_keys=True) + "\n", encoding="utf-8") | ||
| out_md.write_text(_render_markdown(payload), encoding="utf-8") |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 12 hours ago
In general, to fix uncontrolled path usage for files derived from user input, you should constrain where files can be written and/or normalize and validate the path before use. A common pattern is to define a fixed “output root” directory, resolve user-provided paths relative to that root, normalize via Path.resolve() or os.path.realpath(), and then ensure the final path remains within the root. If it does not, fail with an error instead of writing the file. This prevents ../ traversal and disallows redirecting output to unintended locations.
For this script, the minimal fix without changing observable behavior too much is:
- Introduce an output root directory (e.g., current working directory by default).
- Resolve
out_jsonandout_mdrelative to that root. - Normalize them with
Path.resolve(strict=False)(orresolve()), and then check that their resolved paths are within the root directory usingPath.is_relative_to()if available, or an equivalent check withrelative_to()in a try/except. - If the check fails, raise a clear exception instead of writing.
Since we must only modify the shown code in scripts/strict21_preflight.py, we can:
- Compute a
base_dir = Path.cwd().resolve()inmain(). - Replace the simple
Path(args.out_json)/Path(args.out_md)with validated helper calls like_safe_output_path(base_dir, args.out_json)and_safe_output_path(base_dir, args.out_md). - Add a small helper function
_safe_output_path(base_dir: Path, user_path: str) -> Pathabovemain()that:- Creates a
Path(user_path). - If it is absolute, uses it as-is but still validates it against
base_dir(or, alternatively, disallow absolute paths). - Joins with
base_dirif it is relative. - Calls
.resolve()and checks that the resolved path is underbase_dir. If not, raisesValueError.
- Creates a
This retains the ability to specify file names relative to the current directory and prevents directory traversal outside the output root. No new imports are needed; Path is already imported.
| @@ -90,6 +90,26 @@ | ||
| return json.loads(resp.read().decode("utf-8")) | ||
|
|
||
|
|
||
| def _safe_output_path(base_dir: Path, user_path: str) -> Path: | ||
| """ | ||
| Construct a normalized output path under ``base_dir`` from a user-provided | ||
| path. Raises ValueError if the resulting path would escape ``base_dir``. | ||
| """ | ||
| raw_path = Path(user_path) | ||
|
|
||
| if not raw_path.is_absolute(): | ||
| candidate = (base_dir / raw_path).resolve() | ||
| else: | ||
| candidate = raw_path.resolve() | ||
|
|
||
| try: | ||
| candidate.relative_to(base_dir) | ||
| except ValueError: | ||
| raise ValueError(f"Output path {candidate!r} is outside allowed directory {base_dir!r}") | ||
|
|
||
| return candidate | ||
|
|
||
|
|
||
| def _canonical_contexts(raw: str) -> list[str]: | ||
| if not raw.strip(): | ||
| return list(DEFAULT_CANONICAL_CONTEXTS) | ||
| @@ -255,8 +275,9 @@ | ||
|
|
||
| def main() -> int: | ||
| args = _parse_args() | ||
| out_json = Path(args.out_json) | ||
| out_md = Path(args.out_md) | ||
| base_dir = Path.cwd().resolve() | ||
| out_json = _safe_output_path(base_dir, args.out_json) | ||
| out_md = _safe_output_path(base_dir, args.out_md) | ||
| canonical = _canonical_contexts(args.canonical_contexts) | ||
|
|
||
| token = (os.environ.get("GITHUB_TOKEN") or "").strip() or (os.environ.get("GH_TOKEN") or "").strip() |
| PERMISSION_HTTP_CODES = {401, 403, 404} | ||
| DEFAULT_CANONICAL_CONTEXTS = [ | ||
| "applitools-core", | ||
| "pr-agent", | ||
| "deep-agent", | ||
| "audit-pr-evidence", | ||
| "backend", | ||
| "backend-postgres", | ||
| "coverage", | ||
| "CodeQL", | ||
| "codecov-analytics", | ||
| "Analyze (actions)", | ||
| "Analyze (javascript-typescript)", | ||
| "Analyze (python)", | ||
| "CodeRabbit", | ||
| "dependency-review", | ||
| "compose-smoke", | ||
| "frontend", | ||
| "label", | ||
| "codacy-equivalent-zero", | ||
| "sonar-branch-zero", | ||
| "Seer Code Review", | ||
| "SonarCloud Code Analysis", | ||
| ] |
There was a problem hiding this comment.
1. Contexts don't match repo 🐞 Bug ✓ Correctness
The preflight enforces a large hard-coded canonical context list and fails when any are missing from branch protection or emitted checks. In this repo, the documented required check is only verify and the workflows shown do not define most canonical contexts, so strict21-preflight is very likely to mark PRs non_compliant and block merges once required.
Agent Prompt
## Issue description
The strict-21 preflight enforces a hard-coded canonical contexts list that appears inconsistent with the repo’s documented/defined CI checks (e.g., `verify`). This likely causes `non_compliant` results and blocks PRs.
## Issue Context
- The script defaults to `DEFAULT_CANONICAL_CONTEXTS` and fails when any context is missing.
- Repo documentation and workflows indicate `verify` is the required check today.
## Fix Focus Areas
- scripts/strict21_preflight.py[15-38]
- scripts/strict21_preflight.py[93-97]
- scripts/strict21_preflight.py[99-128]
- .github/workflows/strict21-preflight.yml[20-32]
- .github/BRANCH_PROTECTION.md[5-12]
- .github/workflows/verify.yml[13-17]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| def _classify_http_status(code: int) -> str: | ||
| return "inconclusive_permissions" if code in PERMISSION_HTTP_CODES else "api_error" | ||
|
|
There was a problem hiding this comment.
2. Preflight fails open on 403/404 🐞 Bug ⛨ Security
API permission/availability failures (including HTTP 401/403/404) are classified as inconclusive_permissions and the script exits 0, so the workflow can succeed without having validated compliance. This undermines the intended “preflight gate”.
Agent Prompt
## Issue description
The preflight gate succeeds (exit 0) when it cannot validate compliance due to missing token or certain HTTP errors (401/403/404). This is a fail-open pattern for a protection gate.
## Issue Context
- HTTP 401/403/404 map to `inconclusive_permissions`.
- `main()` returns failure only for `non_compliant` or `api_error`.
## Fix Focus Areas
- scripts/strict21_preflight.py[15-16]
- scripts/strict21_preflight.py[70-72]
- scripts/strict21_preflight.py[189-197]
- scripts/strict21_preflight.py[295-297]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Problem / Outcome
Start strict-21 rollout (Vercel-free) with canonical context policy and preflight gate.
Scope and Success Criteria
Risk Classification
risk:medium
Rollback Plan
Revert this branch if strict preflight blocks valid PRs unexpectedly.
Evidence Paths