Skip to content

Release/v1.0.15#98

Merged
omsherikar merged 19 commits intomainfrom
release/v1.0.15
Feb 8, 2026
Merged

Release/v1.0.15#98
omsherikar merged 19 commits intomainfrom
release/v1.0.15

Conversation

@omsherikar
Copy link
Contributor

@omsherikar omsherikar commented Feb 8, 2026

Summary by CodeRabbit

  • New Features

    • RAG-backed code indexing & retrieval, LLM-driven refactoring suggestions and documentation, plus backend and Groq LLM clients.
    • New CLI commands for repo management, RAG (index/search/status), suggest, and document; workspace mapping and repo connect/disconnect.
  • Improvements

    • Stronger security analysis for command/SQL risks, anonymized pattern fingerprinting, improved project-root detection, and a real refactor-apply flow.
    • Workspace persistence and repository detection enhancements.
  • Dependencies

    • Version bumped to 1.0.15; new runtime libraries added.
  • Tests

    • Expanded unit/integration tests covering RAG, LLM, safety, CLI, and fingerprinting.

Features:
- LLM orchestration system with multi-backend support (Groq, OpenAI)
- RAG-based code retrieval with FAISS indexing
- Repository and workspace management
- ML infrastructure foundation (AI module, metrics, analysis tools)

Bug Fixes:
- Fixed 9 critical test failures (security rule IDs, pattern fingerprinting, ranking)
- All 716 tests now passing with 73% coverage
- Resolved anonymization behavior in pattern learning

Improvements:
- Enhanced security analyzer with better context awareness
- Improved CLI feedback and error messages
- Code quality and linting improvements

Breaking Changes: None
Copilot AI review requested due to automatic review settings February 8, 2026 11:52
@coderabbitai
Copy link

coderabbitai bot commented Feb 8, 2026

Warning

Rate limit exceeded

@omsherikar has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 27 minutes and 54 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds RAG indexing/parsing, LLM integration (Groq + backend), safety gating and orchestration, workspace/repo management and CLI expansions, alias-aware security analysis, pattern anonymization/fingerprinting, many tests and utilities, example test project files, and bumps package version and runtime dependencies.

Changes

Cohort / File(s) Summary
RAG infrastructure
refactron/rag/__init__.py, refactron/rag/parser.py, refactron/rag/chunker.py, refactron/rag/indexer.py, refactron/rag/retriever.py
Adds tree-sitter CodeParser, CodeChunk/CodeChunker, RAGIndexer (ChromaDB + SentenceTransformer), and ContextRetriever with metadata, embeddings, persistence, and error handling; new dataclasses for parsed entities and index stats.
LLM integration & safety
refactron/llm/__init__.py, refactron/llm/client.py, refactron/llm/backend_client.py, refactron/llm/orchestrator.py, refactron/llm/models.py, refactron/llm/prompts.py, refactron/llm/safety.py
Introduces GroqClient and BackendLLMClient, LLMOrchestrator (context retrieval, prompt building, parsing, confidence handling), RefactoringSuggestion models, prompt templates, and SafetyGate with syntax/risk/import checks.
CLI, repo & workspace flows
refactron/cli.py, refactron/core/workspace.py, refactron/core/repositories.py
Extensive CLI additions (repo/rag groups, suggest/document, interactive selectors), WorkspaceManager and WorkspaceMapping persistence, and GitHub repository listing with robust API/error handling.
Core/refactor plumbing
refactron/core/refactor_result.py, refactron/core/refactron.py, refactron/core/config.py, refactron/core/models.py
Implements apply() to apply refactor ops to files, exposes get_python_files publicly, adds .refactron-backup to excludes, and adds DOCUMENTATION IssueCategory.
Security analyzer
refactron/analyzers/security_analyzer.py
Adds alias map resolution, improved full-name resolution for imports/aliases, expanded dangerous-call detection (os.system/os.popen + subprocess shell handling), SQL injection detection (f-strings/.format), and SyntaxError reporting as SECURITY issues.
Pattern fingerprinting
refactron/patterns/fingerprint.py
Replaces normalization with anonymization, generalizes AST tokens (FUNC/CLASS), and updates fingerprint/refactoring hashing to use anonymized code.
Refactor-related packages & versioning
refactron/__init__.py, pyproject.toml
Bumps version to 1.0.15 and adds runtime dependencies: chromadb, tree-sitter, tree-sitter-python, sentence-transformers, groq, pydantic.
Complex test repo & utilities
complex_test_repo/core/engine.py, complex_test_repo/data/processor.py, complex_test_repo/main.py, complex_test_repo/utils/*, complex_test_repo/core/__init__.py, complex_test_repo/data/__init__.py
Adds example ProcessingEngine (in-memory DB, execute/query_user/process_items), data processors, utility math/string helpers, module docstrings, and a main script demonstrating usage.
Scripts & tooling
scripts/analyze_feedback_data.py
New feedback analysis script aggregating pattern/feedback stats, computing ML-readiness, and writing a JSON report.
Tests
tests/* (many files)
Adds or updates extensive tests: RAG parser/chunker/indexer/retriever, Groq and backend clients, LLMOrchestrator, SafetyGate, fingerprint anonymization, CLI flows, pattern learning/ranking behavior; adjusts security rule IDs and fingerprint expectations.
Miscellaneous
.gitignore, package init files, small package initializers
Adds ignore entries for test artifacts and adds/updates package init docstrings and re-exports (llm, rag, utils, data).

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant RAGIndexer
    participant CodeParser
    participant CodeChunker
    participant SentenceTransformer
    participant ChromaDB

    CLI->>RAGIndexer: index_repository(repo_path)
    RAGIndexer->>CodeParser: parse_file(file_path)
    CodeParser-->>RAGIndexer: ParsedFile
    RAGIndexer->>CodeChunker: chunk_file(parsed_file)
    CodeChunker-->>RAGIndexer: List[CodeChunk]
    RAGIndexer->>SentenceTransformer: encode(chunk.content)
    SentenceTransformer-->>RAGIndexer: embeddings
    RAGIndexer->>ChromaDB: add_documents(docs, embeddings, metadata)
    ChromaDB-->>RAGIndexer: ack
    RAGIndexer-->>CLI: IndexStats
Loading
sequenceDiagram
    actor User
    participant CLI
    participant LLMOrchestrator
    participant ContextRetriever
    participant ChromaDB
    participant LLMClient
    participant SafetyGate

    User->>CLI: suggest(target, issue_id)
    CLI->>LLMOrchestrator: generate_suggestion(issue, original_code)
    LLMOrchestrator->>ContextRetriever: retrieve_similar(query)
    ContextRetriever->>ChromaDB: query(embedding)
    ChromaDB-->>ContextRetriever: RetrievedContext[]
    ContextRetriever-->>LLMOrchestrator: contexts
    LLMOrchestrator->>LLMClient: generate(prompt with contexts)
    LLMClient-->>LLMOrchestrator: generated_text
    LLMOrchestrator->>SafetyGate: validate(suggestion)
    SafetyGate-->>LLMOrchestrator: SafetyCheckResult
    LLMOrchestrator-->>CLI: RefactoringSuggestion
    CLI-->>User: display suggestion
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

🐰 I hopped through trees of code and prompt,
I fetched the chunks and made them chomp.
I nudged the LLM, I checked for fright,
I scrubbed the names and hid them tight.
A rabbit cheers — index, suggest, delight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Release/v1.0.15' is a generic release marker that accurately reflects the changeset's purpose as a version bump release, covering the version update from 1.0.14 to 1.0.15 across multiple files and the addition of major new features (RAG, LLM, workspace management, CLI enhancements).
Docstring Coverage ✅ Passed Docstring coverage is 92.59% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch release/v1.0.15

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request refactoring security labels Feb 8, 2026
- Fixed F821 undefined name error in refactron/rag/indexer.py
- Added try/except import block for GroqClient type hint
- Ensures CI linting passes
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@refactron/core/refactor_result.py`:
- Around line 108-109: The loop currently skips non-existent files by "continue"
when "file_path.exists()" is false, which leaves the overall "success" flag
unchanged; update the block in refactor_result.py so that when
"file_path.exists()" is false you set "success = False" (and optionally record
an error message for that "file_path") before continuing, ensuring the function
(and its "success" variable) correctly reflects skipped/missing targets.
- Around line 118-124: The loop that applies edits (for op in ops) currently
skips operations when op.old_code is not found and still leaves the overall
success flag True; update the logic in the method in refactor_result.py so it
tracks per-operation failures: when op.old_code is not found in new_content,
mark a local flag (or set success = False) and record the failed op (e.g.,
collect op or its index) rather than silently passing, and ensure the returned
success reflects whether all ops were applied; keep the replacement behavior
(new_content.replace(..., 1)) but change the else branch to flip success to
False and optionally append the failed op to a failure list for callers to
inspect.

In `@refactron/core/repositories.py`:
- Around line 74-93: Remove the redundant "from datetime import datetime"
re-import and instead use the already imported datetime and timezone; for
parsing creds.expires_at when it's a string, replace only a trailing "Z" with
"+00:00" (to support ISO Zulu) and call datetime.fromisoformat(expires_str) so
arbitrary offsets are preserved, then if the resulting expires_at has no tzinfo
attach timezone.utc before comparing to datetime.now(timezone.utc); keep the
try/except around ValueError/AttributeError and raise the same RuntimeError when
now >= expires_at.

In `@refactron/llm/client.py`:
- Around line 75-80: The call to self.client.chat.completions.create uses
"temperature or self.temperature" and "max_tokens or self.max_tokens", which
treats valid values like 0.0 (temperature) or 0 (max_tokens) as falsy and
silently substitutes the defaults; change these to explicit None checks—e.g.
pass temperature if temperature is not None else self.temperature, and similarly
for max_tokens—to preserve intentional zero values; apply the same fix in the
corresponding backend_client.py site where the same pattern is used (search for
self.client.chat.completions.create and the temperature/max_tokens arguments).

In `@refactron/llm/orchestrator.py`:
- Around line 56-68: The NameError occurs because results may be undefined if
self.retriever.retrieve_similar raises; initialize results (e.g., results = [])
before the try block and populate context_snippets from results only after safe
assignment, and update any later checks that iterate results to use the
already-populated context_snippets or the initialized results variable
(references: self.retriever, retrieve_similar, results, context_snippets,
rag_context) so exceptions leave a defined empty fallback and no NameError
occurs.
🟠 Major comments (19)
refactron/core/refactor_result.py-126-127 (1)

126-127: ⚠️ Potential issue | 🟠 Major

No backup is created before overwriting files.

The write is destructive with no rollback path. The codebase already has a FileOperations helper with backup support (see tests/autofix/test_file_ops.py). Consider creating a backup before writing, or integrating with FileOperations.

refactron/core/refactor_result.py-96-103 (1)

96-103: ⚠️ Potential issue | 🟠 Major

apply() ignores self.preview_mode — destructive writes can happen in preview mode.

The class declares preview_mode: bool = True, implying a dry-run default, but apply() never checks it. A caller in preview mode would still have files overwritten.

Proposed fix
     def apply(self) -> bool:
         """Apply the refactoring operations to the files."""
+        if self.preview_mode:
+            return False
+
         # Group operations by file
refactron/patterns/fingerprint.py-211-216 (1)

211-216: ⚠️ Potential issue | 🟠 Major

Python 3.8 fallback silently skips anonymization, producing version-dependent fingerprints.

On Python 3.8 (which is supported per pyproject.toml), ast.unparse is unavailable, so this falls back to _normalize_code(code) — the original un-anonymized code. This means the same code will produce different fingerprints on Python 3.8 vs 3.9+, breaking cross-version pattern matching.

Consider either:

  1. Dropping Python 3.8 support (it's EOL since Oct 2024), or
  2. Adding a fallback using astunparse or ast.dump for 3.8.
refactron/llm/backend_client.py-99-104 (1)

99-104: 🛠️ Refactor suggestion | 🟠 Major

Chain exceptions with from e for proper tracebacks.

Ruff B904 flags this — re-raising without from e loses the original traceback context, making debugging harder.

Proposed fix
         except requests.exceptions.RequestException as e:
-            raise RuntimeError(f"Failed to connect to Refactron backend: {e}")
+            raise RuntimeError(f"Failed to connect to Refactron backend: {e}") from e
         except RuntimeError:
             raise
         except Exception as e:
-            raise RuntimeError(f"Unexpected error during backend LLM generation: {e}")
+            raise RuntimeError(f"Unexpected error during backend LLM generation: {e}") from e
refactron/llm/backend_client.py-70-76 (1)

70-76: ⚠️ Potential issue | 🟠 Major

temperature or self.temperature silently ignores 0.0.

0.0 is a valid temperature (deterministic output) but is falsy in Python, so temperature or self.temperature would discard it and use the default. The same pattern appears in the sibling GroqClient.generate (see refactron/llm/client.py line 79). Use an explicit None check instead.

Proposed fix
         payload = {
             "prompt": prompt,
             "system": system,
-            "temperature": temperature or self.temperature,
-            "max_tokens": max_tokens or self.max_tokens,
+            "temperature": temperature if temperature is not None else self.temperature,
+            "max_tokens": max_tokens if max_tokens is not None else self.max_tokens,
             "model": self.model,
         }
refactron/llm/prompts.py-55-62 (1)

55-62: ⚠️ Potential issue | 🟠 Major

Duplicate key "risk_score" in JSON schema.

Lines 58 and 59 both define "risk_score". The LLM will see a schema with a duplicate key, which is ambiguous — one of them likely should be a different field (perhaps "confidence" or similar), or one should be removed.

Proposed fix (assuming one is extraneous)
 Output valid JSON:
 {
     "safe": boolean,
     "risk_score": float (0.0-1.0),
-    "risk_score": float (0.0-1.0),
     "issues": [list of strings]
 }
refactron/core/repositories.py-95-110 (1)

95-110: ⚠️ Potential issue | 🟠 Major

No URL scheme validation — potential SSRF via file:// or custom schemes (S310).

api_base_url is passed as a string and fed directly into urlopen without scheme validation. If a malicious or misconfigured base URL uses file:/// or another scheme, it could read local files. Validate the scheme before making the request.

Proposed fix
     # Normalize the base URL
     base = api_base_url.rstrip("/")
     url = f"{base}/api/github/repositories"
+
+    # Validate URL scheme
+    if not url.startswith(("https://", "http://")):
+        raise RuntimeError(f"Invalid API URL scheme. Expected https:// or http://, got: {url}")
refactron/rag/chunker.py-91-117 (1)

91-117: ⚠️ Potential issue | 🟠 Major

Docstring is duplicated in chunk content — body already includes it.

func.body (from ParsedFunction) is the full function source including its docstring. Lines 96–97 prepend the docstring again, so it appears twice in the chunk content. This duplication will skew embeddings and inflate chunk size.

The same pattern occurs in _create_class_chunks for both class overview chunks (line 127) and method chunks (lines 157–159).

Proposed fix for _create_function_chunk
     def _create_function_chunk(self, func: ParsedFunction, file_path: str) -> CodeChunk:
         """Create a chunk for a function."""
         # Build content with context
-        content_parts = []
-
-        if func.docstring:
-            content_parts.append(f'"""{func.docstring}"""')
-
-        content_parts.append(func.body)
 
         # Add context header
         header = f"File: {file_path}, Function: {func.name}"
-        content = header + "\n" + "-" * len(header) + "\n\n" + "\n".join(content_parts)
+        content = header + "\n" + "-" * len(header) + "\n\n" + func.body

Apply the same pattern to method chunks in _create_class_chunks.

refactron/llm/safety.py-37-42 (1)

37-42: ⚠️ Potential issue | 🟠 Major

Likely wrong field: confidence_score is the adjusted score, not the raw LLM score.

Per the RefactoringSuggestion model, confidence_score is documented as "Adjusted score (safety gate result)" while llm_confidence is the "Raw score from LLM response". Since the safety gate is what produces the adjusted score, checking confidence_score here appears circular. Should this be suggestion.llm_confidence instead?

Proposed fix
         # 2. Confidence Check
-        if suggestion.confidence_score < self.min_confidence:
+        if suggestion.llm_confidence < self.min_confidence:
             issues.append(
-                f"Low confidence score: {suggestion.confidence_score:.2f} < {self.min_confidence}"
+                f"Low confidence score: {suggestion.llm_confidence:.2f} < {self.min_confidence}"
             )
             score *= 0.8
refactron/rag/parser.py-207-239 (1)

207-239: 🛠️ Refactor suggestion | 🟠 Major

_extract_function_docstring and _extract_class_docstring are identical — DRY violation.

These two methods have identical implementations. Extract a shared helper.

Proposed refactor
-    def _extract_function_docstring(self, node: Node, source: bytes) -> Optional[str]:
-        """Extract function docstring."""
-        body_node = node.child_by_field_name("body")
-        if not body_node:
-            return None
-
-        for child in body_node.children:
-            if child.type == "expression_statement":
-                string_node = child.children[0] if child.children else None
-                if string_node and string_node.type == "string":
-                    return (
-                        source[string_node.start_byte : string_node.end_byte]
-                        .decode("utf-8")
-                        .strip("\"'")
-                    )
-        return None
-
-    def _extract_class_docstring(self, node: Node, source: bytes) -> Optional[str]:
-        """Extract class docstring."""
-        body_node = node.child_by_field_name("body")
-        if not body_node:
-            return None
-
-        for child in body_node.children:
-            if child.type == "expression_statement":
-                string_node = child.children[0] if child.children else None
-                if string_node and string_node.type == "string":
-                    return (
-                        source[string_node.start_byte : string_node.end_byte]
-                        .decode("utf-8")
-                        .strip("\"'")
-                    )
-        return None
+    def _extract_docstring(self, node: Node, source: bytes) -> Optional[str]:
+        """Extract docstring from a function or class node."""
+        body_node = node.child_by_field_name("body")
+        if not body_node:
+            return None
+
+        for child in body_node.children:
+            if child.type == "expression_statement":
+                string_node = child.children[0] if child.children else None
+                if string_node and string_node.type == "string":
+                    return (
+                        source[string_node.start_byte : string_node.end_byte]
+                        .decode("utf-8")
+                        .strip("\"'")
+                    )
+        return None

Update callers at lines 155 and 184:

docstring = self._extract_docstring(node, source)
refactron/analyzers/security_analyzer.py-401-417 (1)

401-417: ⚠️ Potential issue | 🟠 Major

Double-reporting and false positives for user-defined functions named system or popen

The bare names in always_shell cause two issues:

  1. Double-reporting: A bare system() or popen() call triggers both SEC001 (DANGEROUS_FUNCTIONS check) and SEC0051 (always_shell check), reporting the same vulnerability twice.

  2. False positives: User-defined functions named system or popen will match both SEC001 and SEC0051, incorrectly flagging them as dangerous. The _get_function_name() method returns just the attribute name (e.g., "system" from both system() and os.system), and _get_full_function_name() also returns "system" for bare calls, so both checks match unqualified bare names.

Consider removing "system" and "popen" from always_shell since SEC001 already catches these via DANGEROUS_FUNCTIONS. The always_shell check should focus on qualified names like "os.system" and "os.popen" to avoid these collisions.

complex_test_repo/data/processor.py-34-49 (1)

34-49: ⚠️ Potential issue | 🟠 Major

deep_nesting_example references multiple undefined names — causes pipeline failure and NameError at runtime.

default_value, meets_requirement_1, early_result_1, meets_requirement_2, early_result_2, and perform_main_operation are all undefined (Ruff F821). This function will raise NameError if called. Parameters b, c, d are also unused.

If this is intended as a pseudocode-style example for pattern demonstration, consider either:

  1. Adding stub definitions for the referenced names, or
  2. Prefixing unused args with _ and adding a # noqa: F821 comment, or
  3. Wrapping the body with raise NotImplementedError to make it clear this is a template.
refactron/core/workspace.py-204-221 (1)

204-221: ⚠️ Potential issue | 🟠 Major

Incomplete URL sanitization — may match spoofed hostnames.

The CodeQL findings here are valid. "github.com" in url (Line 213) matches URLs like https://notgithub.com/... or https://evil.com/github.com/.... Similarly, url.split("github.com/")[1] on Line 217 can split at the wrong position.

Additionally, .replace(".git", "") on Lines 215 and 217 replaces all occurrences of .git anywhere in the string, not just a trailing suffix (e.g., a repo named my.github-tool would be mangled).

Consider using proper URL parsing to anchor matches to the hostname and strip only a trailing .git:

🔒 Proposed fix
-            # Parse the remote URL (support both HTTPS and SSH)
-            for line in content.split("\n"):
-                line = line.strip()
-                if line.startswith("url = "):
-                    url = line.replace("url = ", "")
-
-                    # Extract repo name from URL
-                    # HTTPS: https://github.com/user/repo.git
-                    # SSH: git@github.com:user/repo.git
-                    if "github.com" in url:
-                        if url.startswith("git@github.com:"):
-                            repo_path = url.replace("git@github.com:", "").replace(".git", "")
-                        elif "github.com/" in url:
-                            repo_path = url.split("github.com/")[1].replace(".git", "")
-                        else:
-                            continue
-
-                        return repo_path
+            from urllib.parse import urlparse
+
+            for line in content.split("\n"):
+                line = line.strip()
+                if line.startswith("url = "):
+                    url = line.replace("url = ", "").strip()
+
+                    repo_path = None
+                    if url.startswith("git@github.com:"):
+                        repo_path = url[len("git@github.com:"):]
+                    else:
+                        parsed = urlparse(url)
+                        if parsed.hostname == "github.com" and parsed.path:
+                            repo_path = parsed.path.lstrip("/")
+
+                    if repo_path:
+                        if repo_path.endswith(".git"):
+                            repo_path = repo_path[:-4]
+                        if "/" in repo_path:
+                            return repo_path
refactron/rag/retriever.py-68-73 (1)

68-73: 🛠️ Refactor suggestion | 🟠 Major

Chain the original exception for better debugging.

When re-raising as RuntimeError, the original exception context is lost. Use raise ... from to preserve the traceback chain.

Proposed fix
         try:
             self.collection = self.client.get_collection(name=collection_name)
-        except Exception:
+        except Exception as e:
             raise RuntimeError(
                 f"Index not found at {self.index_path}. Run 'refactron rag index' first."
-            )
+            ) from e
refactron/rag/indexer.py-40-46 (1)

40-46: ⚠️ Potential issue | 🟠 Major

GroqClient is undefined at the annotation site — causes linting failure.

GroqClient is used in the type hint on line 45 but is never imported at module scope. While from __future__ import annotations defers evaluation (so this won't crash at runtime), it fails linting (F821) and breaks introspection tools. The local import at line 65 doesn't help since it's scoped to the method body. Either add a TYPE_CHECKING guarded import or use a string literal annotation.

Proposed fix
 from typing import List, Optional
 
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    from refactron.llm.client import GroqClient
+
 try:
     import chromadb

And remove the now-unnecessary local import at line 65:

-        # Initialize LLM client for summarization
-        from refactron.llm.client import GroqClient
-
         self.llm_client = llm_client
refactron/rag/indexer.py-239-250 (1)

239-250: ⚠️ Potential issue | 🟠 Major

Replace collection.add() with collection.upsert() to support re-indexing.

The chunk ID format {file_path}:{line_start}-{line_end} is deterministic. Re-indexing the same file will produce duplicate IDs, which causes DuplicateIDError in ChromaDB's collection.add(). Use collection.upsert() instead, which updates existing entries when IDs already exist in the collection.

Proposed fix
-        self.collection.add(
+        self.collection.upsert(
             documents=documents,
             metadatas=metadatas,
             ids=ids,
             embeddings=embeddings,
         )
refactron/llm/orchestrator.py-94-108 (1)

94-108: ⚠️ Potential issue | 🟠 Major

Numeric confidence values > 1.0 from the LLM (e.g., 85) are silently clamped.

When the LLM returns a numeric confidence_score like 85 (intending 85%), the code takes float(85) and later clamps it to 1.0 on Line 118. This differs from the string path where "85%" is correctly normalized to 0.85. Consider normalizing numeric values > 1.0 the same way.

Proposed fix
                 else:
                     confidence = float(raw_confidence)
+                    # Normalize if LLM returned a percentage as a number (e.g., 85 instead of 0.85)
+                    if confidence > 1.0:
+                        confidence = confidence / 100.0
refactron/cli.py-1233-1263 (1)

1233-1263: 🛠️ Refactor suggestion | 🟠 Major

Dead code: the "target_path" not in locals() check on Line 1262 is unreachable.

In all code paths reaching Line 1262, target_path is already assigned (either from the interactive selector on Line 1243 or _validate_path on Line 1259). Using locals() for control flow is fragile and confusing. Remove the dead check.

Proposed fix
     else:
         # Path explicitly provided, validate and use it
         target_path = _validate_path(target)
 
-    # Setup (only if not already set by interactive selector)
-    if "target_path" not in locals():
-        target_path = _validate_path(target)
     cfg = _load_config(config, profile, environment)
refactron/cli.py-2770-2791 (1)

2770-2791: ⚠️ Potential issue | 🟠 Major

Validate clone_url before passing it to git clone.

matching_repo.clone_url originates from an API response. While the list-form subprocess.run avoids shell injection, a compromised or unexpected API response could supply a malicious URL (e.g., a local path, file:// URI, or SSH command). Consider validating that clone_url starts with https:// before cloning.

Proposed fix
+        if not matching_repo.clone_url.startswith("https://"):
+            console.print(
+                f"[red]Refusing to clone non-HTTPS URL: {matching_repo.clone_url}[/red]"
+            )
+            raise SystemExit(1)
+
         console.print(f"[primary]Cloning {matching_repo.full_name}...[/primary]\n")
🟡 Minor comments (28)
tests/test_patterns_integration.py-262-271 (1)

262-271: ⚠️ Potential issue | 🟡 Minor

Remove unused variables and duplicate assertion (pipeline failure: F841).

After removing the cross-project pattern_hash comparison, pattern_hashes1 and pattern_hashes2 (lines 262–263) are now dead code, which is the cause of the flake8 F841 pipeline failure. Additionally, line 271 is an exact duplicate of the assertion on line 268.

Proposed fix
-            pattern_hashes1 = {p.pattern_hash for p in patterns1.values()}
-            pattern_hashes2 = {p.pattern_hash for p in patterns2.values()}
-
             # Verify the key property: storage directories are different (isolation works)
             # Note: Anonymized fingerprinting may make structurally similar code have same hash
             # which is correct behavior - the test should check storage isolation
             assert refactron1.pattern_storage.storage_dir != refactron2.pattern_storage.storage_dir
-
-            # Verify storage directories are separate (isolation mechanism)
-            assert refactron1.pattern_storage.storage_dir != refactron2.pattern_storage.storage_dir
tests/test_analyzer_edge_cases.py-784-784 (1)

784-784: ⚠️ Potential issue | 🟡 Minor

Same SEC0052 rule ID concern — verify against the analyzer implementation.

This is the same potentially incorrect rule ID change (SEC005SEC0052) seen in tests/test_false_positive_reduction.py. The test name test_command_injection_os_system and docstring reference shell=True, which in the other file maps to a "shell=True has high confidence (0.95)" comment — all three assertions should use the same, correct rule ID.

scripts/analyze_feedback_data.py-40-42 (1)

40-42: ⚠️ Potential issue | 🟡 Minor

all_patterns.update(patterns) silently drops duplicate pattern IDs across projects.

If two storage directories contain patterns with the same key, only the last one survives. This skews total_patterns in the report (it shows unique keys, not the true aggregate count). If the intent is to count all patterns, track the total via a running counter; if deduplication is intended, this is fine but should be documented.

scripts/analyze_feedback_data.py-16-19 (1)

16-19: ⚠️ Potential issue | 🟡 Minor

sys.path manipulation causes pipeline failure (E402).

The sys.path.insert on line 17 forces the refactron import below the top-of-file level, triggering flake8 E402. Since this lives under scripts/, consider making the package installable (e.g., pip install -e .) or adding a # noqa: E402 if the path hack is intentional. Alternatively, restructure so the import doesn't need the hack.

Quick fix if keeping the path hack
 sys.path.insert(0, str(Path(__file__).parent.parent))
 
-from refactron.patterns.storage import PatternStorage
+from refactron.patterns.storage import PatternStorage  # noqa: E402
scripts/analyze_feedback_data.py-54-97 (1)

54-97: ⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefixes — flagged by linter and failing the pipeline.

Lines 55, 58, 64, 71, 80, 87, 91, 94, 95, and 97 use f"..." with no interpolation placeholders. These trigger F541 errors in both Ruff and flake8 and are part of the current CI failure.

Proposed fix (representative subset)
-    print(f"\n{'='*60}")
-    print(f"AGGREGATE STATISTICS")
-    print(f"{'='*60}\n")
+    print(f"\n{'='*60}")
+    print("AGGREGATE STATISTICS")
+    print(f"{'='*60}\n")
 
-    print(f"📊 Total Records:")
+    print("📊 Total Records:")

Apply the same pattern to lines 64, 71, 80, 87, 91, 94, 95, and 97 — remove f where no {} expressions are present.

complex_test_repo/utils/string_helper.py-1-3 (1)

1-3: ⚠️ Potential issue | 🟡 Minor

Unused imports sys and os are failing the pipeline.

The pre-commit pipeline reports flake8: F401 for both imports. Same recommendation as math_lib.py — either add # noqa: F401 suppressions or exclude complex_test_repo/ from flake8.

Proposed fix
-import sys
-import os
+import sys  # noqa: F401
+import os  # noqa: F401
complex_test_repo/utils/math_lib.py-1-2 (1)

1-2: ⚠️ Potential issue | 🟡 Minor

Unused import os is failing the pipeline.

The pre-commit pipeline reports flake8: F401 'os' imported but unused. If this file is an intentional test fixture with code smells, consider adding a # noqa: F401 suppression or excluding complex_test_repo/ from flake8 in your pre-commit config so the pipeline passes.

Proposed fix
-import os
+import os  # noqa: F401

Or remove the import entirely if it's not needed for the test scenario.

refactron/llm/models.py-5-5 (1)

5-5: ⚠️ Potential issue | 🟡 Minor

Remove unused imports Dict and Any to fix pipeline.

These are flagged as F401 by flake8 in CI.

Proposed fix
-from typing import List, Optional, Dict, Any
+from typing import List, Optional
refactron/llm/models.py-61-66 (1)

61-66: ⚠️ Potential issue | 🟡 Minor

Add return type annotation to __post_init__ to satisfy mypy.

The pipeline reports no-untyped-def errors from mypy.

Proposed fix
-    def __post_init__(self):
+    def __post_init__(self) -> None:
refactron/llm/backend_client.py-5-6 (1)

5-6: ⚠️ Potential issue | 🟡 Minor

Remove unused imports flagged by pipeline.

os, Any, and Dict are imported but unused, causing flake8 F401 errors in CI.

Proposed fix
-import os
-from typing import Any, Dict, Optional
+from typing import Optional
refactron/analyzers/security_analyzer.py-413-413 (1)

413-413: ⚠️ Potential issue | 🟡 Minor

Inconsistent rule ID format: SEC0051 / SEC0052 vs SEC001SEC013.

All other rule IDs in this analyzer use a 3-digit suffix (e.g., SEC001, SEC012). The new command-injection rules use 4-digit suffixes (SEC0051, SEC0052), which will be inconsistent with whitelist patterns and any downstream tooling that expects a fixed format. Consider SEC005a/SEC005b or SEC014/SEC015 instead.

Also applies to: 436-436

refactron/rag/parser.py-106-110 (1)

106-110: ⚠️ Potential issue | 🟡 Minor

.strip("\"'") is imprecise for docstring extraction.

strip("\"'") removes all leading and trailing quote characters individually, not quote delimiters as a unit. For a triple-quoted docstring like """Hello "world"!""", this produces Hello "world"! — correct. But for edge cases like """'Hello'""", it would strip the inner quotes at the boundaries too, yielding Hello instead of 'Hello'. Consider stripping only the known delimiters (""" or ''') explicitly.

refactron/llm/safety.py-4-4 (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Unused import Optional — causes pipeline failure (F401).

Optional is imported but never used in this file. This triggers the flake8 F401 error reported in the pipeline.

Proposed fix
-from typing import List, Optional
+from typing import List
refactron/rag/parser.py-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Unused imports Any and Dict — causes pipeline failure (F401).

Proposed fix
-from typing import Any, Dict, List, Optional, Tuple
+from typing import List, Optional, Tuple
refactron/llm/prompts.py-53-53 (1)

53-53: ⚠️ Potential issue | 🟡 Minor

Typo: "file implementation" should likely be "file manipulation" or "file I/O".

In the context of identifying dangerous side effects, "file implementation" doesn't make sense. This should probably read "file manipulation" or "file system operations".

Proposed fix
-3. Dangerous side effects (file implementation, network calls)
+3. Dangerous side effects (file manipulation, network calls)
refactron/core/workspace.py-195-221 (1)

195-221: ⚠️ Potential issue | 🟡 Minor

detect_repository matches any url = line, not specifically from [remote "origin"].

The .git/config file can contain multiple [remote "..."] sections. This code returns the first url = line encountered, which could belong to a non-origin remote (e.g., a fork or upstream). Consider at minimum tracking whether you're inside a [remote "origin"] section, or using git config --get remote.origin.url via subprocess as a more robust alternative.

tests/test_backend_client.py-57-57 (1)

57-57: ⚠️ Potential issue | 🟡 Minor

Invalid escape sequences \( and \) — causes pipeline failure (W605).

Use a raw string for the regex match argument to avoid invalid escape sequence warnings.

🔧 Proposed fix
-    with pytest.raises(RuntimeError, match="Backend LLM proxy error \(500\): Internal Server Error"):
+    with pytest.raises(RuntimeError, match=r"Backend LLM proxy error \(500\): Internal Server Error"):
tests/test_llm_orchestrator.py-3-3 (1)

3-3: ⚠️ Potential issue | 🟡 Minor

Unused import json — causes pipeline failure.

json is not used anywhere in this file. Remove it to fix the flake8 F401 error in CI.

🔧 Proposed fix
-import json
 from unittest.mock import Mock, MagicMock
tests/test_backend_client.py-4-4 (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Unused import requests — causes pipeline failure.

The requests module is not used directly; patches reference it by string path. Remove this import to fix the flake8 F401 error in CI.

🔧 Proposed fix
 import pytest
-import requests
 from unittest.mock import MagicMock, patch
tests/test_backend_client.py-20-23 (1)

20-23: ⚠️ Potential issue | 🟡 Minor

test_backend_client_initialization does not mock load_credentials.

This test constructs BackendLLMClient without using the mock_credentials fixture, unlike test_backend_client_generate and test_backend_client_error_handling. Since BackendLLMClient.__init__ calls load_credentials() (line 38), the test will invoke the real function. While load_credentials() gracefully returns None if credentials don't exist and won't cause the test to fail, it violates test isolation by depending on external state. For consistency and proper test isolation, use the mock_credentials fixture.

complex_test_repo/data/processor.py-2-2 (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Unused import time — causes pipeline failure.

The import time is unused and flagged by flake8 in the CI pipeline. Remove it to fix the pre-commit check.

🔧 Proposed fix
-import time
-
+
refactron/rag/indexer.py-22-23 (1)

22-23: ⚠️ Potential issue | 🟡 Minor

Remove unused import ParsedFile.

Pipeline reports F401: 'refactron.rag.parser.ParsedFile' imported but unused.

Proposed fix
-from refactron.rag.parser import CodeParser, ParsedFile
+from refactron.rag.parser import CodeParser
complex_test_repo/main.py-10-21 (1)

10-21: ⚠️ Potential issue | 🟡 Minor

Pipeline failure: missing type annotations.

The pre-commit pipeline reports mypy: function missing return type annotation and flake8: no-untyped-def. Add a return type annotation to run().

Proposed fix
-def run():
+def run() -> None:
tests/test_rag_indexer.py-7-8 (1)

7-8: ⚠️ Potential issue | 🟡 Minor

Pipeline failure: remove unused imports.

The CI reports flake8: F401 for unused imports. sys and create_autospec are imported but never used.

Proposed fix
-from unittest.mock import Mock, MagicMock, patch, create_autospec
-import sys
+from unittest.mock import Mock, MagicMock, patch
refactron/cli.py-3493-3498 (1)

3493-3498: ⚠️ Potential issue | 🟡 Minor

Existing documentation file is silently overwritten without warning.

If {stem}_doc.md already exists, write_text on Line 3495 overwrites it without warning the user, even in interactive mode. Consider checking for existence and informing the user.

refactron/llm/orchestrator.py-7-7 (1)

7-7: ⚠️ Potential issue | 🟡 Minor

Remove unused import List — this is causing a pipeline failure.

The List type from typing is imported but never used in this file. The CI pre-commit check (flake8 F401) is failing because of this.

Proposed fix
-from typing import Optional, List, Union
+from typing import Optional, Union
refactron/llm/orchestrator.py-205-216 (1)

205-216: ⚠️ Potential issue | 🟡 Minor

Potential IndexError in delimiter parsing if a tag appears at the end of the response.

If the LLM response ends with a bare @@@MARKDOWN@@@ (no trailing content), parts[i + 1] will raise IndexError. The outer try/except catches it, but this turns a partially parseable response into a full failure. Add a bounds check.

Proposed fix
             if "@@@EXPLANATION@@@" in response_text:
                 parts = response_text.split("@@@")
                 for i, part in enumerate(parts):
-                    if part == "EXPLANATION":
+                    if part == "EXPLANATION" and i + 1 < len(parts):
                         explanation = parts[i + 1].strip()
-                    elif part == "CONFIDENCE":
+                    elif part == "CONFIDENCE" and i + 1 < len(parts):
                         try:
                             confidence = float(parts[i + 1].strip())
                         except ValueError:
                             pass
-                    elif part == "MARKDOWN":
+                    elif part == "MARKDOWN" and i + 1 < len(parts):
                         proposed_code = parts[i + 1].strip()
refactron/cli.py-2975-2982 (1)

2975-2982: ⚠️ Potential issue | 🟡 Minor

Resource leak: open(os.devnull, "w") handles are never closed.

sys.stdout and sys.stderr are reassigned without closing the new file handles. While the process exits shortly after, this is poor practice and can cause issues with buffered output. Use os.devnull file handles properly.

Proposed fix
     if background:
         # Suppress all logging and output in background mode
         logging.getLogger().setLevel(logging.CRITICAL)
-        import os
-        import sys
-
-        sys.stdout = open(os.devnull, "w")
-        sys.stderr = open(os.devnull, "w")
+        devnull = open(os.devnull, "w")  # noqa: SIM115
+        sys.stdout = devnull
+        sys.stderr = devnull

Alternatively, since os and sys are already imported at the module level, the local re-imports are also unnecessary.

🧹 Nitpick comments (27)
refactron/core/refactor_result.py (1)

129-130: Bare except Exception swallows all errors without logging.

Per the Ruff BLE001 hint, catching blind Exception hides the root cause. At minimum, narrow to OSError and log/collect the error so callers can diagnose failures.

Proposed fix
-            except Exception:
+            except OSError:
                 success = False
scripts/analyze_feedback_data.py (4)

26-27: Glob rooted at CWD is fragile and potentially expensive.

Path('.') ties the script to whichever directory it's invoked from, and **/ can recurse into node_modules, .git, venv, etc. Consider accepting an optional CLI argument for the root path and/or excluding common non-project directories.


46-47: Broad except Exception swallows all errors silently.

This is a diagnostic script so the non-fatal approach is reasonable, but consider catching a narrower set (e.g., OSError, ValueError) or at least logging the traceback at debug level so storage corruption or code bugs aren't quietly hidden.


77-78: hasattr checks are likely unnecessary for typed model fields.

RefactoringFeedback (from refactron/patterns/models.py) defines code_pattern_hash and reason as declared fields. If the attribute is always present (possibly None), hasattr is redundant — a truthiness check suffices. If the schema genuinely varies, hasattr is a sign that the data model needs tightening.


115-117: Report is written to CWD with no error handling.

If the working directory isn't writable, this will raise an unhandled exception after all analysis work is done. A try/except around the write (or writing to a configurable/temp path) would make this more robust.

complex_test_repo/utils/math_lib.py (1)

18-18: Stale comment: "Magic numbers and no docstring" is inaccurate.

The constants have been extracted to SURCHARGE_RATE and CONSTANT_0_95, and the function has a docstring. This comment appears to be leftover from a prior iteration.

complex_test_repo/utils/string_helper.py (1)

5-27: clean_text and clean_content are functionally identical.

Both functions perform text.strip().lower(). If this duplication is intentional (as a test fixture for the refactoring tool to detect), consider adding a comment to that effect so future contributors don't "fix" it and break test scenarios.

pyproject.toml (1)

38-45: Heavy dependencies like sentence-transformers should be optional.

sentence-transformers pulls in PyTorch (~2 GB), making the base install very heavy for users who only need the core analysis features. Consider moving the RAG/LLM dependencies to [project.optional-dependencies] (e.g., rag or llm extras) so that the core package remains lightweight.

[project.optional-dependencies]
rag = [
    "chromadb>=0.4.22",
    "tree-sitter>=0.20.4",
    "tree-sitter-python>=0.20.4",
    "sentence-transformers>=2.5.1",
]
llm = [
    "groq>=0.4.0",
    "pydantic>=2.6.0",
]
refactron/llm/backend_client.py (1)

86-94: Content-Type check may miss charset suffixes.

Line 88 checks response.headers.get("Content-Type") == "application/json", but servers often return application/json; charset=utf-8. Use startswith or in for a more robust check.

Proposed fix
-                if response.headers.get("Content-Type") == "application/json":
+                content_type = response.headers.get("Content-Type", "")
+                if "application/json" in content_type:
refactron/patterns/fingerprint.py (1)

191-209: Mutating AST nodes in-place affects the tree passed by caller.

_anonymize_code parses its own tree from code, so this is safe currently. However, replacing ast.Constant.value with the string "CONST" changes the literal type (e.g., intstr), which may produce subtly different ast.unparse output (e.g., 'CONST' with quotes instead of a bare token). This is fine for fingerprinting purposes but worth a brief inline note.

refactron/llm/client.py (1)

75-82: No error handling around the Groq API call.

Unlike BackendClient.generate (in refactron/llm/backend_client.py, lines 80–103) which wraps the call in try/except with descriptive RuntimeError messages, GroqClient.generate lets any SDK exception propagate raw. Also, response.choices[0] will raise an IndexError if the API returns an empty choices list.

Proposed fix
-        response = self.client.chat.completions.create(
-            model=self.model,
-            messages=messages,
-            temperature=temperature if temperature is not None else self.temperature,
-            max_tokens=max_tokens if max_tokens is not None else self.max_tokens,
-        )
-
-        return response.choices[0].message.content
+        try:
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                temperature=temperature if temperature is not None else self.temperature,
+                max_tokens=max_tokens if max_tokens is not None else self.max_tokens,
+            )
+        except Exception as e:
+            raise RuntimeError(f"Groq API call failed: {e}") from e
+
+        if not response.choices:
+            raise RuntimeError("Groq API returned empty choices")
+
+        return response.choices[0].message.content
refactron/analyzers/security_analyzer.py (2)

130-131: _alias_map set on self creates implicit method-ordering dependency.

_alias_map is assigned to self inside analyze(), but _get_full_function_name (and all check methods that call it, like _check_command_injection) depend on it. If any of these are called outside of analyze() (e.g., in tests or a future refactor), they'll raise AttributeError. Consider initializing _alias_map in __init__ or passing it as a parameter.


175-188: _build_alias_map doesn't use self — could be a @staticmethod.

This is a pure function that takes tree and returns a dict. Making it a @staticmethod clarifies that it has no side effects on the instance.

refactron/core/repositories.py (1)

133-177: Missing exception chaining (from e / from None) on re-raised exceptions.

All raise RuntimeError(...) inside except blocks (lines 143, 153, 157, 168, 171, 174, 177) lose the original traceback context. Use raise ... from e to preserve it, or raise ... from None to explicitly suppress it. This was flagged by static analysis (B904).

Example fix for a few cases
-            raise RuntimeError(
-                f"Authentication failed (HTTP 401): {detail}\n\n"
+            raise RuntimeError(
+                f"Authentication failed (HTTP 401): {detail}\n\n"
                 ...
-            )
+            ) from e
         elif e.code == 403:
             raise RuntimeError(
                 "GitHub access denied. ..."
-            )
+            ) from e
refactron/llm/safety.py (1)

67-87: String-based risk scanning is prone to false positives from comments and string literals.

_assess_risk uses simple keyword in code substring checks, which will match inside comments, docstrings, and string literals. For example, a docstring saying "use subprocess for..." or a string like "open(" would trigger a risk penalty. Consider parsing the AST and checking only actual code nodes, or accept this as a known limitation with a TODO.

refactron/rag/parser.py (1)

74-75: No error handling for file I/O.

open(file_path, "rb") will raise FileNotFoundError or PermissionError with no handling. Since this is consumed by CodeChunker.chunk_file and potentially batch operations, an unhandled exception here could abort an entire indexing run. Consider catching and wrapping, or at minimum documenting that callers must handle it.

refactron/rag/chunker.py (1)

78-89: Module chunk line_range is approximate.

line_range=(1, len(parsed_file.imports) + 1) doesn't account for the module docstring lines or blank lines between the docstring and imports. If the docstring spans multiple lines, the range will be too short. Consider deriving the range from the actual parsed positions if available.

refactron/core/workspace.py (1)

85-92: File permissions race: config is world-readable between open() and chmod().

The file is created with default umask permissions, then chmod is applied afterward. On a shared system, there's a brief window where the file could be read by others. For sensitive config, consider setting the umask before writing or opening with os.open + os.fdopen with explicit mode 0o600.

This is low risk for a CLI config file with workspace mappings (no secrets), but worth noting since the comment explicitly calls out restrictive permissions.

refactron/rag/__init__.py (1)

6-6: __all__ is not sorted (Ruff RUF022).

Static analysis flags that __all__ entries should be sorted alphabetically.

🔧 Proposed fix
-__all__ = ["RAGIndexer", "ContextRetriever"]
+__all__ = ["ContextRetriever", "RAGIndexer"]
complex_test_repo/main.py (1)

6-8: Constants lack semantic meaning.

THRESHOLD_10 = 10, THRESHOLD_20 = 20, and CONSTANT_3 = 3 are named after their values, providing no domain context. If these are thresholds for something specific, name them accordingly (e.g., MIN_THRESHOLD, MAX_THRESHOLD). Otherwise they add indirection with no benefit.

refactron/rag/indexer.py (1)

68-72: Exception chaining missing in __init__ error paths.

Per Ruff B904, when re-raising exceptions within except blocks (like in retriever.py line 70-73), use raise ... from err to preserve the original traceback. The same pattern should be applied here for consistency, though this file doesn't have that exact pattern. The broad Exception catches on lines 103, 135, 181, 208 are acceptable given these are optional/fallback paths.

refactron/rag/retriever.py (1)

22-32: Consider tightening the line_range and metadata type hints.

line_range: tuple and metadata: dict lose type information. Using Tuple[int, int] and Dict[str, Any] would improve IDE support and static analysis.

refactron/llm/orchestrator.py (1)

122-124: Prefer logger.exception over logger.error to preserve the traceback.

In exception handlers (Lines 123, 153, 235), logger.error loses the stack trace. Using logger.exception automatically includes it, which significantly aids debugging LLM and safety validation failures.

Proposed fix (example for line 123)
         except Exception as e:
-            logger.error(f"LLM generation failed: {e}")
+            logger.exception("LLM generation failed")
refactron/cli.py (4)

1098-1107: Performance concern: rglob("*.py") is fully materialized for each subdirectory just to get a count.

For large monorepos, iterating all .py files recursively in every immediate subdirectory can be noticeably slow. Consider using a generator with a counter or sum(1 for _ in item.rglob("*.py")) — though that doesn't avoid the traversal. At minimum, consider adding a console.status spinner around this block.


2806-2821: Duplicate comment and redundant re-import of subprocess and sys.

Lines 2811-2812 contain a duplicate comment. Also, subprocess was already imported earlier in this same function (Line 2773), and sys is a module-level import (Line 5).

Proposed fix
     # Trigger background indexing via subprocess
     # We spawn a separate process so it survives after this CLI command exits
-    import subprocess
-    import sys
 
     console.print("[dim]Spawning background indexer...[/dim]")
     try:
         # Run 'refactron rag index' in the background
-        # Run 'refactron rag index' in the background
         # We redirect output to DEVNULL to keep it quiet

3366-3408: Apply logic may corrupt the file if the LLM returns a different number of lines than the original snippet.

When --apply and --line are used, the code replaces lines start_line_idx:end_line_idx with the LLM's proposed code (Line 3392-3395). If the proposed code has a different line count than the original window, adjacent code is preserved correctly, but this assumption should be communicated to the user. Also, there's a minor TOCTOU: code (Line 3298) and current_lines (Line 3385) are from separate reads.

Consider warning the user about potential structural changes and performing a single read.


3080-3103: Silent except Exception: pass swallows all reranking errors without any diagnostic output.

When --rerank is used, failures (API errors, auth issues, timeouts) are completely invisible. At minimum, log at debug level so users with --verbose can diagnose issues.

Proposed fix
                 except Exception:
-                    pass  # Fallback to distance-based score
+                    logger.debug("AI reranking failed for result %d, using distance-based score", i)

Comment on lines +108 to +109
if not file_path.exists():
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Non-existent files are silently treated as successful.

When file_path.exists() is false, the method continues without setting success = False. This means missing target files are reported as successfully applied, which is incorrect — the caller has no way to know operations were skipped.

Proposed fix
                 if not file_path.exists():
+                    success = False
                     continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not file_path.exists():
continue
if not file_path.exists():
success = False
continue
🤖 Prompt for AI Agents
In `@refactron/core/refactor_result.py` around lines 108 - 109, The loop currently
skips non-existent files by "continue" when "file_path.exists()" is false, which
leaves the overall "success" flag unchanged; update the block in
refactor_result.py so that when "file_path.exists()" is false you set "success =
False" (and optionally record an error message for that "file_path") before
continuing, ensuring the function (and its "success" variable) correctly
reflects skipped/missing targets.

Comment on lines +118 to +124
for op in ops:
if op.old_code in new_content:
new_content = new_content.replace(op.old_code, op.new_code, 1)
else:
# Fallback: try to find by line number if exact match fails
# This part is omitted for simplicity in this version
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Operations that fail to match old_code are silently ignored, yet success remains True.

If op.old_code is not found in the current content (e.g., the file was edited, or the code drifted), the operation is simply skipped. The method still returns True, misleading the caller into thinking all operations were applied. This is a correctness bug.

Proposed fix — track per-operation failure
                 for op in ops:
                     if op.old_code in new_content:
                         new_content = new_content.replace(op.old_code, op.new_code, 1)
                     else:
-                        # Fallback: try to find by line number if exact match fails
-                        # This part is omitted for simplicity in this version
-                        pass
+                        # old_code not found — mark as failed
+                        success = False
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for op in ops:
if op.old_code in new_content:
new_content = new_content.replace(op.old_code, op.new_code, 1)
else:
# Fallback: try to find by line number if exact match fails
# This part is omitted for simplicity in this version
pass
for op in ops:
if op.old_code in new_content:
new_content = new_content.replace(op.old_code, op.new_code, 1)
else:
# old_code not found — mark as failed
success = False
🤖 Prompt for AI Agents
In `@refactron/core/refactor_result.py` around lines 118 - 124, The loop that
applies edits (for op in ops) currently skips operations when op.old_code is not
found and still leaves the overall success flag True; update the logic in the
method in refactor_result.py so it tracks per-operation failures: when
op.old_code is not found in new_content, mark a local flag (or set success =
False) and record the failed op (e.g., collect op or its index) rather than
silently passing, and ensure the returned success reflects whether all ops were
applied; keep the replacement behavior (new_content.replace(..., 1)) but change
the else branch to flip success to False and optionally append the failed op to
a failure list for callers to inspect.

Comment on lines +74 to +93
from datetime import datetime, timezone

if creds.expires_at:
try:
# Parse the expiration time
from datetime import datetime

if isinstance(creds.expires_at, str):
# Remove timezone info for comparison
expires_str = creds.expires_at.replace("+00:00", "").replace("Z", "")
expires_at = datetime.fromisoformat(expires_str).replace(tzinfo=timezone.utc)
else:
expires_at = creds.expires_at

now = datetime.now(timezone.utc)
if now >= expires_at:
raise RuntimeError("Your session has expired. Please run 'refactron login' again.")
except (ValueError, AttributeError):
# If we can't parse the expiration, continue anyway
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Duplicate datetime import causes pipeline failure (F811); timezone handling is fragile.

Line 74 imports from datetime import datetime, timezone, then line 79 re-imports from datetime import datetime — this triggers the flake8 F811 error reported in the pipeline. Remove the redundant import on line 79.

Additionally, the timezone-stripping logic (line 83) only handles "+00:00" and "Z" but not arbitrary offsets. Python 3.7+ datetime.fromisoformat (3.11+ handles Z natively) could be used more robustly.

Proposed fix
     # Check if token is expired
     from datetime import datetime, timezone
 
     if creds.expires_at:
         try:
-            # Parse the expiration time
-            from datetime import datetime
-
             if isinstance(creds.expires_at, str):
-                # Remove timezone info for comparison
-                expires_str = creds.expires_at.replace("+00:00", "").replace("Z", "")
-                expires_at = datetime.fromisoformat(expires_str).replace(tzinfo=timezone.utc)
+                expires_str = creds.expires_at.replace("Z", "+00:00")
+                expires_at = datetime.fromisoformat(expires_str)
             else:
                 expires_at = creds.expires_at
 
             now = datetime.now(timezone.utc)
             if now >= expires_at:
                 raise RuntimeError("Your session has expired. Please run 'refactron login' again.")
         except (ValueError, AttributeError):
             # If we can't parse the expiration, continue anyway
             pass
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 90-90: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@refactron/core/repositories.py` around lines 74 - 93, Remove the redundant
"from datetime import datetime" re-import and instead use the already imported
datetime and timezone; for parsing creds.expires_at when it's a string, replace
only a trailing "Z" with "+00:00" (to support ISO Zulu) and call
datetime.fromisoformat(expires_str) so arbitrary offsets are preserved, then if
the resulting expires_at has no tzinfo attach timezone.utc before comparing to
datetime.now(timezone.utc); keep the try/except around ValueError/AttributeError
and raise the same RuntimeError when now >= expires_at.

Comment on lines +75 to +80
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=temperature or self.temperature,
max_tokens=max_tokens or self.max_tokens,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

temperature or self.temperature silently ignores temperature=0.0.

0.0 is falsy in Python, so passing temperature=0.0 (a valid and common value for deterministic output) will fall through to self.temperature. The same issue affects max_tokens=0, though that's less likely in practice. The same pattern exists in refactron/llm/backend_client.py (line 75).

Proposed fix
         response = self.client.chat.completions.create(
             model=self.model,
             messages=messages,
-            temperature=temperature or self.temperature,
-            max_tokens=max_tokens or self.max_tokens,
+            temperature=temperature if temperature is not None else self.temperature,
+            max_tokens=max_tokens if max_tokens is not None else self.max_tokens,
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=temperature or self.temperature,
max_tokens=max_tokens or self.max_tokens,
)
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
temperature=temperature if temperature is not None else self.temperature,
max_tokens=max_tokens if max_tokens is not None else self.max_tokens,
)
🤖 Prompt for AI Agents
In `@refactron/llm/client.py` around lines 75 - 80, The call to
self.client.chat.completions.create uses "temperature or self.temperature" and
"max_tokens or self.max_tokens", which treats valid values like 0.0
(temperature) or 0 (max_tokens) as falsy and silently substitutes the defaults;
change these to explicit None checks—e.g. pass temperature if temperature is not
None else self.temperature, and similarly for max_tokens—to preserve intentional
zero values; apply the same fix in the corresponding backend_client.py site
where the same pattern is used (search for self.client.chat.completions.create
and the temperature/max_tokens arguments).

Comment on lines +56 to +68
# 1. Retrieve Context
context_snippets = []
if self.retriever:
try:
# Search for similar code or relevant context
results = self.retriever.retrieve_similar(
f"{issue.message} {original_code}", top_k=3
)
context_snippets = [r.content for r in results]
except Exception as e:
logger.warning(f"Context retrieval failed: {e}")

rag_context = "\n\n".join(context_snippets) if context_snippets else "No context available."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

NameError when context retrieval fails: results is referenced but may not be defined.

If self.retriever is set but retrieve_similar raises an exception (caught on Line 65), the variable results is never assigned. However, Line 113 checks if self.retriever (still truthy) and tries to iterate results, causing a NameError at runtime.

Proposed fix
         # 1. Retrieve Context
         context_snippets = []
+        results = []
         if self.retriever:
             try:
                 # Search for similar code or relevant context
                 results = self.retriever.retrieve_similar(
                     f"{issue.message} {original_code}", top_k=3
                 )
                 context_snippets = [r.content for r in results]
             except Exception as e:
                 logger.warning(f"Context retrieval failed: {e}")

And on Line 113 simplify to always use results:

-                context_files=[r.file_path for r in results] if self.retriever else [],
+                context_files=[r.file_path for r in results],

Also applies to: 110-113

🧰 Tools
🪛 Ruff (0.14.14)

[warning] 65-65: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@refactron/llm/orchestrator.py` around lines 56 - 68, The NameError occurs
because results may be undefined if self.retriever.retrieve_similar raises;
initialize results (e.g., results = []) before the try block and populate
context_snippets from results only after safe assignment, and update any later
checks that iterate results to use the already-populated context_snippets or the
initialized results variable (references: self.retriever, retrieve_similar,
results, context_snippets, rag_context) so exceptions leave a defined empty
fallback and no NameError occurs.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Release v1.0.15 introduces Retrieval-Augmented Generation (RAG) indexing/retrieval plus LLM-driven suggestion/documentation flows, expands CLI commands for repo/workspace management, and updates pattern fingerprinting + security detection behavior.

Changes:

  • Add RAG infrastructure (tree-sitter parser, chunker, ChromaDB indexer, retriever) and related CLI commands.
  • Add LLM integration (Groq + backend proxy clients, orchestrator, safety gate, prompts) and new CLI “suggest”/“document” commands.
  • Update pattern fingerprinting (anonymized/structural hashing) and refine security analyzer rule IDs + alias tracking; add/adjust tests.

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 33 comments.

Show a summary per file
File Description
tests/test_rag_retriever.py Adds unit tests for RAG retrieval behavior.
tests/test_rag_parser.py Adds unit tests for tree-sitter based parsing.
tests/test_rag_indexer.py Adds unit tests for ChromaDB indexing flow.
tests/test_rag_chunker.py Adds unit tests for code chunk generation.
tests/test_patterns_ranker.py Adjusts ranker test fixtures for new fingerprinting behavior.
tests/test_patterns_learner.py Updates learner expectations due to anonymized hashing.
tests/test_patterns_integration.py Revises integration test to validate storage isolation.
tests/test_patterns_fingerprint.py Updates fingerprint tests for anonymization + AST patterns.
tests/test_llm_orchestrator.py Adds orchestrator tests for suggestion generation + RAG usage.
tests/test_groq_client.py Adds Groq client unit tests.
tests/test_false_positive_reduction.py Updates security rule IDs in tests.
tests/test_backend_client.py Adds backend LLM proxy client unit tests.
tests/test_analyzer_edge_cases.py Updates edge-case security assertions for new rule IDs.
scripts/analyze_feedback_data.py Adds script to summarize stored feedback/pattern data.
refactron/rag/retriever.py Implements context retrieval from ChromaDB index.
refactron/rag/parser.py Implements tree-sitter based Python parsing.
refactron/rag/indexer.py Implements repository indexing into ChromaDB with optional AI summaries.
refactron/rag/chunker.py Implements semantic chunk creation (module/function/class/method).
refactron/rag/init.py Exposes RAG public API surface.
refactron/patterns/fingerprint.py Switches to anonymized structural fingerprinting.
refactron/llm/safety.py Adds safety validation gate for LLM-generated code.
refactron/llm/prompts.py Adds prompt templates for suggestion/safety/doc generation.
refactron/llm/orchestrator.py Coordinates RAG + LLM generation and safety checks.
refactron/llm/models.py Adds dataclasses for suggestions and safety results.
refactron/llm/client.py Adds Groq API client wrapper.
refactron/llm/backend_client.py Adds backend proxy LLM client wrapper.
refactron/llm/init.py Exposes LLM public API surface.
refactron/core/workspace.py Adds workspace mapping management for repo/local path linking.
refactron/core/repositories.py Adds backend API integration to list connected GitHub repos.
refactron/core/refactron.py Renames Python-file discovery API and adds marker for .refactron.yaml.
refactron/core/refactor_result.py Implements “apply” of refactoring operations to disk.
refactron/core/models.py Adds DOCUMENTATION issue category.
refactron/core/config.py Excludes backup directory from analysis/refactor scanning.
refactron/cli.py Adds repo/rag commands, interactive target selection, suggest/document flows, version display updates.
refactron/analyzers/security_analyzer.py Adds alias tracking, new rule IDs, and broadened command injection checks.
refactron/init.py Version bump to 1.0.15.
pyproject.toml Version bump and adds RAG/LLM dependencies.
complex_test_repo/utils/string_helper.py Adds sample “complex” repo content (test fixture).
complex_test_repo/utils/math_lib.py Adds sample “complex” repo content (test fixture).
complex_test_repo/utils/init.py Adds sample “complex” repo content (test fixture).
complex_test_repo/main.py Adds sample “complex” repo content (test fixture).
complex_test_repo/data/processor.py Adds sample “complex” repo content (test fixture).
complex_test_repo/data/init.py Adds sample “complex” repo content (test fixture).
complex_test_repo/core/engine.py Adds sample “complex” repo content (test fixture).
complex_test_repo/core/init.py Adds sample “complex” repo content (test fixture).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +110 to +114
suggestion = RefactoringSuggestion(
issue=issue,
original_code=original_code,
context_files=[r.file_path for r in results] if self.retriever else [],
proposed_code=data.get("proposed_code", ""),
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

results can be undefined here if self.retriever is set but retrieve_similar() raises (the exception is swallowed earlier). That will raise an UnboundLocalError when building context_files. Initialize results = [] before the retrieval try/except (or set it in the except) and use that variable consistently.

Copilot uses AI. Check for mistakes.

# Initialize Python language - wrap capsule with Language
PY_LANGUAGE = Language(tspython.language())
self.parser = Parser(PY_LANGUAGE)
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parser(PY_LANGUAGE) is not compatible with the tree-sitter Python bindings for the version range declared in pyproject (tree-sitter>=0.20.4), where the usual API is Parser() followed by parser.set_language(...). As written, this will raise at runtime in many environments. Consider constructing the parser with Parser() and then setting the language explicitly.

Suggested change
self.parser = Parser(PY_LANGUAGE)
self.parser = Parser()
self.parser.set_language(PY_LANGUAGE)

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +59
3. Dangerous side effects (file implementation, network calls)
4. Breaking changes

Output valid JSON:
{
"safe": boolean,
"risk_score": float (0.0-1.0),
"risk_score": float (0.0-1.0),
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SAFETY_CHECK_PROMPT JSON schema example contains duplicate risk_score keys, which makes the example invalid/ambiguous and can confuse the model. Remove the duplicate key (and consider correcting “file implementation” to “file manipulation”).

Suggested change
3. Dangerous side effects (file implementation, network calls)
4. Breaking changes
Output valid JSON:
{
"safe": boolean,
"risk_score": float (0.0-1.0),
"risk_score": float (0.0-1.0),
3. Dangerous side effects (file manipulation, network calls)
4. Breaking changes
Output valid JSON:
{
"safe": boolean,
"risk_score": float (0.0-1.0),

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +82
return CodeChunk(
content=content,
chunk_type="module",
file_path=parsed_file.file_path,
line_range=(1, len(parsed_file.imports) + 1),
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line_range for the module chunk is derived from len(parsed_file.imports) + 1, which doesn’t reflect actual source line numbers (imports can span multiple lines and the module docstring length is ignored). Consider computing the range from the AST node positions (e.g., docstring/import nodes) so retrieval metadata points to the right lines.

Suggested change
return CodeChunk(
content=content,
chunk_type="module",
file_path=parsed_file.file_path,
line_range=(1, len(parsed_file.imports) + 1),
# Approximate the line range of the module docstring and imports.
# Start at the top of the file and count physical lines in the docstring
# and import statements, accounting for multi-line constructs.
start_line = 1
end_line = 1
if parsed_file.module_docstring:
# Number of lines in the docstring text
end_line += parsed_file.module_docstring.count("\n") + 1
if parsed_file.imports:
imports_text = "\n".join(parsed_file.imports)
if imports_text:
end_line += imports_text.count("\n") + 1
line_range = (start_line, end_line)
return CodeChunk(
content=content,
chunk_type="module",
file_path=parsed_file.file_path,
line_range=line_range,

Copilot uses AI. Check for mistakes.
Comment on lines +265 to 271
# Verify the key property: storage directories are different (isolation works)
# Note: Anonymized fingerprinting may make structurally similar code have same hash
# which is correct behavior - the test should check storage isolation
assert refactron1.pattern_storage.storage_dir != refactron2.pattern_storage.storage_dir

# Verify storage directories are separate (isolation mechanism)
assert refactron1.pattern_storage.storage_dir != refactron2.pattern_storage.storage_dir
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is duplicated below (the same condition is checked twice). Removing the redundant assert will reduce noise while keeping the isolation check.

Copilot uses AI. Check for mistakes.

tf_logging.set_verbosity_error()
except ImportError:
pass
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
logging.getLogger(__name__).debug(
"transformers package not installed; skipping transformers logging configuration"
)

Copilot uses AI. Check for mistakes.
try:
confidence = float(parts[i + 1].strip())
except ValueError:
pass
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
# Keep the default confidence if parsing fails, but log for observability.
logger.warning(
"Invalid confidence value from LLM response: %r; using default of %s",
parts[i + 1],
confidence,
)

Copilot uses AI. Check for mistakes.
if node.module in dangerous_modules:
imports.add(node.module)
except SyntaxError:
pass
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
# If the code is syntactically invalid, we cannot reliably inspect imports.
# Return the imports found so far (typically an empty set) and let callers
# handle syntax issues separately.
return imports

Copilot uses AI. Check for mistakes.
# Set file permissions to 0600 (user read/write only)
try:
os.chmod(self.config_path, 0o600)
except OSError:
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except OSError:
except OSError:
# Best-effort permission tightening; ignore failures (e.g., non-POSIX filesystems).

Copilot uses AI. Check for mistakes.

return repo_path

except (IOError, OSError):
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@refactron/rag/indexer.py`:
- Around line 61-64: Remove the duplicate unused import of GroqClient (the
redundant "from refactron.llm.client import GroqClient") to fix F401 and shorten
the long RuntimeError message raised when not CHROMA_AVAILABLE (the check using
CHROMA_AVAILABLE and the RuntimeError) so the string is under 100
characters—replace it with a concise message like "ChromaDB not installed; run:
pip install chromadb sentence-transformers" or similar.
- Around line 184-185: The assignment to chunk.content is exceeding the
line-length limit (E501); change the long f-string assignment in the indexer
where chunk.content is set (the line that currently does chunk.content =
f"Summary: {summary}\n\n{chunk.content}") into a shorter form by splitting the
string across multiple concatenated literals or by building the value with a
temporary variable (e.g., prefix = f"Summary: {summary}\n\n" followed by
chunk.content = prefix + chunk.content) so the line length is reduced while
preserving the same content and behavior.
- Line 23: Remove the unused import symbol ParsedFile from the import statement
in refactron.rag.indexer (currently "from refactron.rag.parser import
CodeParser, ParsedFile"); leave only CodeParser and ensure there are no other
references to ParsedFile in this module before committing to eliminate the F401
lint error.
- Around line 250-256: The current call to self.collection.add(...) will raise
on duplicate deterministic IDs during re-indexing; change the call to
self.collection.upsert(...) (preserving the same arguments: documents,
metadatas, ids, embeddings) so existing vectors are updated instead of erroring;
update the invocation in the index_repository flow (where self.collection.add is
used) to use upsert to allow safe re-indexing without clearing the collection.
- Around line 200-215: The prompt string construction (variable prompt)
concatenates chunk.content directly and exceeds line-length limits; break the
long string into shorter concatenated pieces or use multiple parenthesized
f-strings to satisfy E501, and harden against prompt injection by
sanitizing/truncating chunk.content (e.g., strip suspicious directives, limit to
N chars, escape backticks) and/or wrapping the code in explicit markers and
adding a strict system instruction in the call to self.llm_client.generate
(method name) to ignore any embedded instructions in chunk.content; update the
prompt assembly and the call to self.llm_client.generate accordingly so the code
both passes linting and treats chunk.content as untrusted input.
- Around line 127-161: The current loop counts all discovered files for
"total_files" even when _index_file(py_file, ...) fails; change this to track
only successfully indexed files by adding a counter (e.g., successful_files)
that you increment right after chunks = self._index_file(...) succeeds, then use
successful_files instead of len(python_files) when calling
self._save_metadata(...) and when constructing the returned IndexStats (keep
returning total_chunks and chunk_type_counts as before and leave
embedding_model_name and index_path unchanged); update references to
python_files, _index_file, _save_metadata, and IndexStats accordingly.
🧹 Nitpick comments (1)
refactron/rag/indexer.py (1)

110-110: Replace print() warnings with logging.warning().

Multiple warning messages use print() (lines 110, 143, 188). Using the logging module would allow consumers to control verbosity and integrate with their logging infrastructure.

Also applies to: 143-143, 188-188

CHROMA_AVAILABLE = False

from refactron.rag.chunker import CodeChunk
from refactron.rag.parser import CodeParser, ParsedFile
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused import ParsedFile.

This import is flagged by the CI pipeline (F401). ParsedFile is not referenced anywhere in this file.

Proposed fix
-from refactron.rag.parser import CodeParser, ParsedFile
+from refactron.rag.parser import CodeParser
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from refactron.rag.parser import CodeParser, ParsedFile
from refactron.rag.parser import CodeParser
🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 23-23: flake8: F401 'refactron.rag.parser.ParsedFile' imported but unused

🤖 Prompt for AI Agents
In `@refactron/rag/indexer.py` at line 23, Remove the unused import symbol
ParsedFile from the import statement in refactron.rag.indexer (currently "from
refactron.rag.parser import CodeParser, ParsedFile"); leave only CodeParser and
ensure there are no other references to ParsedFile in this module before
committing to eliminate the F401 lint error.

Comment on lines +61 to +64
if not CHROMA_AVAILABLE:
raise RuntimeError(
"ChromaDB is not available. Install with: pip install chromadb sentence-transformers"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix CI failures: remove dead import and shorten long line.

Line 71: The from refactron.llm.client import GroqClient import is unused (already imported at module level, line 27) and causes the F401 pipeline error.

Line 63: Exceeds the 100-character line limit (E501).

Proposed fix
         if not CHROMA_AVAILABLE:
-            raise RuntimeError(
-                "ChromaDB is not available. Install with: pip install chromadb sentence-transformers"
+            raise RuntimeError(  # noqa: TRY003
+                "ChromaDB is not available. "
+                "Install with: pip install chromadb sentence-transformers"
             )
 
         self.workspace_path = Path(workspace_path)
         self.index_path = self.workspace_path / ".rag"
         self.index_path.mkdir(exist_ok=True)
 
-        # Initialize LLM client for summarization
-        from refactron.llm.client import GroqClient
-
         self.llm_client = llm_client

Also applies to: 71-71

🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 63-63: flake8: E501 line too long (101 > 100 characters)

🪛 Ruff (0.14.14)

[warning] 62-64: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@refactron/rag/indexer.py` around lines 61 - 64, Remove the duplicate unused
import of GroqClient (the redundant "from refactron.llm.client import
GroqClient") to fix F401 and shorten the long RuntimeError message raised when
not CHROMA_AVAILABLE (the check using CHROMA_AVAILABLE and the RuntimeError) so
the string is under 100 characters—replace it with a concise message like
"ChromaDB not installed; run: pip install chromadb sentence-transformers" or
similar.

Comment on lines +127 to +161
total_chunks = 0
chunk_type_counts = {}

# Index each file
for py_file in python_files:
try:
chunks = self._index_file(py_file, summarize=summarize)
total_chunks += len(chunks)

# Count chunk types
for chunk in chunks:
chunk_type_counts[chunk.chunk_type] = (
chunk_type_counts.get(chunk.chunk_type, 0) + 1
)
except Exception as e:
# Skip files that can't be parsed
print(f"Warning: Could not index {py_file}: {e}")
continue

# Save index metadata
self._save_metadata(
{
"total_chunks": total_chunks,
"total_files": len(python_files),
"chunk_types": chunk_type_counts,
}
)

return IndexStats(
total_chunks=total_chunks,
total_files=len(python_files),
chunk_types=chunk_type_counts,
embedding_model=self.embedding_model_name,
index_path=str(self.index_path),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

total_files counts discovered files, not successfully indexed files.

If files fail to parse (caught at line 141), they're still included in total_files (line 150 uses len(python_files)). This makes the stat misleading — a user might expect total_files to reflect files that contributed chunks.

Proposed fix
         total_chunks = 0
         chunk_type_counts = {}
+        indexed_files = 0
 
         # Index each file
         for py_file in python_files:
             try:
                 chunks = self._index_file(py_file, summarize=summarize)
                 total_chunks += len(chunks)
+                indexed_files += 1
 
                 # Count chunk types
                 for chunk in chunks:
                     chunk_type_counts[chunk.chunk_type] = (
                         chunk_type_counts.get(chunk.chunk_type, 0) + 1
                     )
             except Exception as e:
                 # Skip files that can't be parsed
                 print(f"Warning: Could not index {py_file}: {e}")
                 continue
 
         # Save index metadata
         self._save_metadata(
             {
                 "total_chunks": total_chunks,
-                "total_files": len(python_files),
+                "total_files": indexed_files,
                 "chunk_types": chunk_type_counts,
             }
         )
 
         return IndexStats(
             total_chunks=total_chunks,
-            total_files=len(python_files),
+            total_files=indexed_files,
             chunk_types=chunk_type_counts,
             embedding_model=self.embedding_model_name,
             index_path=str(self.index_path),
         )
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 141-141: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@refactron/rag/indexer.py` around lines 127 - 161, The current loop counts all
discovered files for "total_files" even when _index_file(py_file, ...) fails;
change this to track only successfully indexed files by adding a counter (e.g.,
successful_files) that you increment right after chunks = self._index_file(...)
succeeds, then use successful_files instead of len(python_files) when calling
self._save_metadata(...) and when constructing the returned IndexStats (keep
returning total_chunks and chunk_type_counts as before and leave
embedding_model_name and index_path unchanged); update references to
python_files, _index_file, _save_metadata, and IndexStats accordingly.

Comment on lines +184 to +185
# Prepend summary to content for embedding (makes it searchable by plain English)
chunk.content = f"Summary: {summary}\n\n{chunk.content}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Line too long — fix CI E501 error on line 184.

Proposed fix
-                        # Prepend summary to content for embedding (makes it searchable by plain English)
-                        chunk.content = f"Summary: {summary}\n\n{chunk.content}"
+                        # Prepend summary to content for embedding
+                        # (makes it searchable by plain English)
+                        chunk.content = (
+                            f"Summary: {summary}\n\n{chunk.content}"
+                        )
🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 184-184: flake8: E501 line too long (105 > 100 characters)

🤖 Prompt for AI Agents
In `@refactron/rag/indexer.py` around lines 184 - 185, The assignment to
chunk.content is exceeding the line-length limit (E501); change the long
f-string assignment in the indexer where chunk.content is set (the line that
currently does chunk.content = f"Summary: {summary}\n\n{chunk.content}") into a
shorter form by splitting the string across multiple concatenated literals or by
building the value with a temporary variable (e.g., prefix = f"Summary:
{summary}\n\n" followed by chunk.content = prefix + chunk.content) so the line
length is reduced while preserving the same content and behavior.

Comment on lines +200 to +215
prompt = (
"Analyze the following Python code snippet and provide a one-sentence "
"summary of its purpose, focusing on what it DOES (e.g. 'Calculates user permissions' "
"or 'Handles secure database connections').\n\n"
f"Code:\n{chunk.content}"
)

try:
summary = self.llm_client.generate(
prompt=prompt,
system="You are a senior software architect. Provide a concise, semantic summary of code purpose.",
max_tokens=100,
)
return summary.strip()
except Exception:
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix line length on line 210 and note prompt injection surface.

Line 210 exceeds the 100-char limit (E501). Additionally, chunk.content is interpolated directly into the LLM prompt (line 204) — if indexing untrusted code, this is a prompt injection vector. Low risk for local repos but worth a defensive note.

Proposed fix for line length
         try:
             summary = self.llm_client.generate(
                 prompt=prompt,
-                system="You are a senior software architect. Provide a concise, semantic summary of code purpose.",
+                system=(
+                    "You are a senior software architect. "
+                    "Provide a concise, semantic summary of code purpose."
+                ),
                 max_tokens=100,
             )
             return summary.strip()
🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 210-210: flake8: E501 line too long (115 > 100 characters)

🪛 Ruff (0.14.14)

[warning] 214-214: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@refactron/rag/indexer.py` around lines 200 - 215, The prompt string
construction (variable prompt) concatenates chunk.content directly and exceeds
line-length limits; break the long string into shorter concatenated pieces or
use multiple parenthesized f-strings to satisfy E501, and harden against prompt
injection by sanitizing/truncating chunk.content (e.g., strip suspicious
directives, limit to N chars, escape backticks) and/or wrapping the code in
explicit markers and adding a strict system instruction in the call to
self.llm_client.generate (method name) to ignore any embedded instructions in
chunk.content; update the prompt assembly and the call to
self.llm_client.generate accordingly so the code both passes linting and treats
chunk.content as untrusted input.

Comment on lines +250 to +256
# Add to collection
self.collection.add(
documents=documents,
metadatas=metadatas,
ids=ids,
embeddings=embeddings,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

collection.add() will fail on re-indexing due to duplicate IDs — use upsert() instead.

IDs are deterministic (file path + line range). If index_repository is called again on the same repo (or overlapping files), ChromaDB's add() raises a duplicate ID error. This makes re-indexing impossible without manually clearing the collection first.

Proposed fix
         # Add to collection
-        self.collection.add(
+        self.collection.upsert(
             documents=documents,
             metadatas=metadatas,
             ids=ids,
             embeddings=embeddings,
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Add to collection
self.collection.add(
documents=documents,
metadatas=metadatas,
ids=ids,
embeddings=embeddings,
)
# Add to collection
self.collection.upsert(
documents=documents,
metadatas=metadatas,
ids=ids,
embeddings=embeddings,
)
🤖 Prompt for AI Agents
In `@refactron/rag/indexer.py` around lines 250 - 256, The current call to
self.collection.add(...) will raise on duplicate deterministic IDs during
re-indexing; change the call to self.collection.upsert(...) (preserving the same
arguments: documents, metadatas, ids, embeddings) so existing vectors are
updated instead of erroring; update the invocation in the index_repository flow
(where self.collection.add is used) to use upsert to allow safe re-indexing
without clearing the collection.

- Added complex_test_repo generated files to .gitignore
- Prevents accidental commits of test artifacts
- Cleaned up working directory
- Replaced insecure substring checks with proper URL parsing
- Use urlparse() to validate hostname exactly equals 'github.com'
- Prevents URL injection attacks (e.g., evil.com/github.com/fake)
- Fixes CodeQL high-severity security alert

Before: if "github.com" in url (VULNERABLE)
After: if parsed.hostname == "github.com" (SECURE)
- Try Language(capsule, 'python') first (Python 3.8 API)
- Fall back to Language(capsule) for newer versions
- Resolves 18 RAG test errors on Python 3.8
Auto-formatted 9 files to pass CI checks:
- refactron/core/workspace.py
- refactron/rag/parser.py
- tests/test_*_client.py, test_*orchestrator.py, test_rag_*.py
- Fixed trailing whitespace
- Applied black formatting
- Fixed import ordering with isort
- Added noqa comments for unused imports in LLM/RAG modules
- Fixed duplicate datetime import in repositories.py
- Fixed f-strings without placeholders in analyze script
- Fixed regex escape sequence in test_backend_client
- Suppressed E203 (black/flake8 conflict) and E501 line length

Remaining lints are non-critical and in test/dev code.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🤖 Fix all issues with AI agents
In `@complex_test_repo/core/engine.py`:
- Around line 38-53: The query_user function builds SQL via f-string causing SQL
injection; change it to use a parameterized query with the DB-API parameter
placeholder your connection expects (e.g., %s or ?), pass user_id as a separate
parameter to cursor.execute instead of interpolating it into query, and
optionally validate/cast user_id to an int before executing; update the
variables cursor and query in query_user to use
cursor.execute(parameterized_query, (user_id,)) and return cursor.fetchone() as
before.
- Around line 24-36: The execute method currently uses os.system(command) in the
"dangerous" mode which allows command injection; replace this unsafe call in
execute with a safe subprocess.run invocation that accepts an argument list (not
a shell string) and shell=False, or alternatively gate the risky branch behind
an explicit opt-in (e.g., an environment flag like ALLOW_INSECURE_EXEC) and/or
move it to an clearly labeled insecure test module; update the execute method
(referencing execute, self.mode, and the os.system usage) to build a list of
arguments instead of passing raw command strings and call subprocess.run(...,
check=True, shell=False) so external input cannot trigger shell interpretation.

In `@complex_test_repo/data/processor.py`:
- Line 1: Remove the unused import by deleting the top-level "import time"
statement in processor.py; ensure no functions or methods reference time
anywhere in the file, run linters/tests to confirm no remaining references, and
commit the change to keep the module clean.
- Around line 36-51: deep_nesting_example currently references six undefined
names and unused params causing NameError; either add minimal stub
implementations for default_value, meets_requirement_1, early_result_1,
meets_requirement_2, early_result_2, and perform_main_operation (and use or
remove parameters b, c, d) so the module is importable, or replace the function
body with a docstring-only/gated example (non-executable) and remove the unused
parameters; specifically update the function deep_nesting_example to call the
new stubs (or be turned into documentation) so Ruff and CI no longer flag
undefined names and unused arguments.

In `@complex_test_repo/utils/math_lib.py`:
- Line 1: Remove the unused top-level import "os" from math_lib.py to resolve
the flake8 F401 failure; locate the import statement ("import os") at the top of
the file (e.g., in utils/math_lib.py) and delete that line ensuring no other
code references the os symbol before committing.

In `@refactron/llm/orchestrator.py`:
- Line 8: Remove the unused List import from the typing import in
orchestrator.py to fix the flake8 F401 error: edit the import line that
currently reads "from typing import List, Optional, Union" and drop "List" so it
becomes "from typing import Optional, Union"; verify no other references to the
symbol List exist in functions or classes within the module (e.g., any type
hints in Orchestrator-related functions) before committing.
- Around line 205-216: The parsing loop in orchestrator.py that checks for
"@@@EXPLANATION@@@" should guard against out-of-range accesses when reading
parts[i + 1]; update the loop in the method that builds
explanation/confidence/proposed_code so each branch (when part ==
"EXPLANATION"/"CONFIDENCE"/"MARKDOWN") first verifies i + 1 < len(parts) before
using parts[i + 1], and for the CONFIDENCE branch also handle non-numeric text
as currently done (ValueError); when a missing value is detected, set a clear
default (e.g., None or 0.0) or log a specific error instead of relying on the
outer generic exception to catch it.

In `@refactron/llm/safety.py`:
- Line 4: Remove the unused Optional import to satisfy flake8 F401: edit the
import statement that currently reads "from typing import List, Optional" in
refactron/llm/safety.py and change it to only import the symbols actually used
(e.g., "from typing import List"), ensuring Optional is no longer imported
anywhere in the file.

In `@refactron/rag/parser.py`:
- Line 7: Remove the unused typing imports causing the flake8 F401 error by
editing the import statement in refactron/rag/parser.py: delete the unused
symbols Any and Dict from the "from typing import ..." line so only the actually
used types (List, Optional, Tuple) remain; ensure no other references to Any or
Dict exist in functions/classes (e.g., parser-related functions) before
committing.
- Around line 56-59: The RuntimeError message in the TREE_SITTER_AVAILABLE check
inside parser.py exceeds the 100-char line limit; split the long string into
shorter concatenated literals or use implicit parentheses so the raised message
for the TREE_SITTER_AVAILABLE check (the RuntimeError in the if not
TREE_SITTER_AVAILABLE block) is wrapped across multiple lines under 100 chars
each while preserving the same full message text.
- Around line 247-257: _extract_parameters currently only reads children with
type "identifier", missing typed_parameter, default_parameter,
list_splat_pattern and dictionary_splat_pattern; update it to handle these node
types by: for typed_parameter and default_parameter, extract the parameter name
from their identifier/name child (e.g., child.child_by_field_name("name") or the
identifier descendant) and decode as before; for list_splat_pattern and
dictionary_splat_pattern, extract the inner identifier but preserve the splat
prefix ("*" or "**") when appending; keep existing behavior for plain
"identifier" nodes and return the assembled names list. Use the function name
_extract_parameters and the node field "parameters" to locate where to change
logic.

In `@scripts/analyze_feedback_data.py`:
- Around line 40-42: all_patterns.update(patterns) silently overwrites existing
pattern IDs; before merging, check for duplicates by computing dup_keys =
set(all_patterns.keys()) & set(patterns.keys()), and either log a warning with
the dup_keys and counts (using the module logger) or explicitly
deduplicate/merge instead of overwriting (e.g., convert all_patterns values to
lists or use setdefault/append to keep multiple entries) so that total_patterns
is correct and duplicate sources are visible; implement the chosen behavior
around the all_patterns.update(patterns) call.
- Around line 16-19: The sys.path.insert call used to prepend the parent
directory is causing a top-of-file import (flake8 E402) before the
PatternStorage import; remove the path-hack and instead ensure the package is
importable (install the project in editable mode via pip install -e .) so from
refactron.patterns.storage import PatternStorage can be at module top, or if you
need a minimal unblock add a local suppression by appending # noqa: E402 to the
sys.path.insert line (and preferably add a comment explaining why) so the linter
stops failing.
- Around line 54-99: Remove stray f-string prefixes from print calls that
contain no interpolations: change the print calls for the literal headers and
labels (the "AGGREGATE STATISTICS" header, "📊 Total Records:", "\n✅ Action
Distribution:", "\n🔧 Operation Types:", "\n📋 Data Quality:", "\n🎯 ML
Readiness:" lines and the three status lines under ML Readiness) to plain string
literals (drop the leading f) while leaving f-strings that do interpolate (e.g.,
prints that reference len(all_feedback), len(all_patterns), with_patterns,
with_reason, quality_score, and the {'='*60} expressions) intact; then run
linters to confirm F541 is resolved.

In `@tests/test_backend_client.py`:
- Around line 21-24: The test test_backend_client_initialization calls
BackendLLMClient.__init__, which invokes load_credentials and can cause I/O/side
effects; update the test to mock or patch load_credentials (or use the existing
mock_credentials fixture) so __init__ receives a controlled credential return
value instead of performing real I/O—e.g., ensure load_credentials is stubbed
before instantiating BackendLLMClient so the test only asserts
client.backend_url and client.model without touching the filesystem.
- Around line 58-59: The regex string passed to pytest.raises uses an escaped
parenthesis causing a W605 invalid escape sequence; update the match argument in
the pytest.raises call (the one that asserts RuntimeError with message "Backend
LLM proxy error (500): Internal Server Error") to use a raw string literal
(e.g., r"Backend LLM proxy error \(500\): Internal Server Error") so the
backslash is treated literally and the invalid escape is avoided.
- Around line 3-6: The tests import unused requests causing a flake8 F401;
remove the top-level "import requests" from tests/test_backend_client.py since
the tests already use string-based patching via `@patch`("requests.post") and
`@patch`("requests.get") and do not reference the requests symbol directly; update
any test that may rely on the symbol to use the patched calls from the decorator
instead.

In `@tests/test_llm_orchestrator.py`:
- Line 72: Fix the typo in the inline comment inside the test that currently
reads "# Verify prompt construction (implicity) by checking generate
call"—change "implicity" to "implicitly" so the comment reads "# Verify prompt
construction (implicitly) by checking generate call"; update this comment in
tests/test_llm_orchestrator.py where the prompt-construction verification is
implemented (the test that asserts the generate call).
- Line 3: The file contains an unused top-level import "json" (import json)
which triggers flake8 F401; remove that import line from
tests/test_llm_orchestrator.py (or replace it with use of json if intended) so
the unused symbol "json" is no longer imported; ensure no other references to
"json" remain in functions or tests in this file such as any test helpers or
fixtures before committing.

In `@tests/test_rag_indexer.py`:
- Around line 3-6: Remove the unused imports `sys` and `create_autospec` from
the import block (the top-level import statement that currently lists sys,
tempfile, Path, MagicMock, Mock, create_autospec, patch) so only used symbols
remain; update the import line to import tempfile, Path, MagicMock, Mock, and
patch and run the tests/flake8 to confirm the F401 failures are resolved.
🧹 Nitpick comments (10)
complex_test_repo/utils/math_lib.py (3)

3-4: CONSTANT_0_95 is a non-descriptive name — it just restates the value.

SURCHARGE_RATE is fine because it conveys business meaning. CONSTANT_0_95 does not — consider naming it after its purpose (e.g., DISCOUNT_RATE, BASE_FACTOR, or whatever it represents in the domain).

Example
-CONSTANT_0_95 = 0.95
+DISCOUNT_RATE = 0.95

Then update usage in legacy_compute accordingly.


18-18: Stale comment contradicts the code.

The inline comment says # Magic numbers and no docstring, but the function does have a docstring and the magic numbers have been extracted into module-level constants. Remove or update this comment so it doesn't mislead future readers.

Proposed fix
-    # Magic numbers and no docstring
     res = a * SURCHARGE_RATE + b * CONSTANT_0_95

11-13: Docstring parameter descriptions are not meaningful.

a: The a and b: The b don't help consumers understand what to pass. Consider describing the domain meaning of these parameters (e.g., base amount, adjustment value).

complex_test_repo/core/engine.py (2)

55-70: Replace string concatenation loop with str.join.

Repeated += on strings is O(n²). Also note the result has a trailing comma, which is likely unintended.

♻️ Proposed fix
     def process_items(self, items):
-        # Performance issue: String concatenation in loop
-        result = ""
-        for item in items:
-            result += str(item) + ","
-        return result
+        return ",".join(str(item) for item in items)

5-11: Placeholder docstrings provide no value.

The class and method docstrings use generic filler text (e.g., attribute1: Description of attribute1). Either document the actual attributes and behavior or remove the docstrings to avoid misleading readers.

scripts/analyze_feedback_data.py (2)

26-27: Glob search rooted at CWD may be slow or miss data.

Path(".") ties the scan to wherever the script is invoked from, which may unintentionally recurse into large directory trees (e.g., node_modules, .git, virtualenvs) or miss storage dirs if run from the wrong directory. Consider accepting an explicit root path as a function argument (defaulting to "."), and potentially skipping known heavy directories.


114-116: Report file is written to the current working directory without error handling.

feedback_analysis.json is unconditionally written to CWD. If the directory isn't writable, this will raise an unhandled exception and the function will fail after printing all the analysis. Consider wrapping in a try/except or making the output path configurable.

refactron/llm/safety.py (1)

67-87: Keyword-based risk assessment can match inside comments, strings, and variable names.

"open(" in code will match occurrences in comments (# open(file)) and string literals, and also flags safe read-only file opens. "requests." matches in docstrings, etc. This is acceptable for a first-pass heuristic, but consider using ast.walk (as you already do in _check_dangerous_imports) for more precise detection if false-positive rates prove problematic.

refactron/rag/parser.py (1)

106-117: Docstring extraction is duplicated across three methods.

_extract_module_docstring, _extract_function_docstring, and _extract_class_docstring share identical logic for locating and extracting a string from the first expression_statement child. Consider extracting a shared helper.

Example consolidation
+    def _extract_docstring_from_body(self, body_node: Node, source: bytes) -> Optional[str]:
+        """Extract docstring from the first expression_statement in a body node."""
+        if not body_node:
+            return None
+        for child in body_node.children:
+            if child.type == "expression_statement":
+                string_node = child.children[0] if child.children else None
+                if string_node and string_node.type == "string":
+                    return (
+                        source[string_node.start_byte : string_node.end_byte]
+                        .decode("utf-8")
+                        .strip("\"'")
+                    )
+        return None

Then reuse in all three extraction methods.

Also applies to: 213-245

refactron/llm/orchestrator.py (1)

122-136: Consider using logging.exception instead of logging.error for better diagnostics.

logging.exception automatically includes the traceback, which is valuable for debugging LLM generation failures. Same applies to Lines 153 and 235.

Proposed fix
         except Exception as e:
-            logger.error(f"LLM generation failed: {e}")
+            logger.exception(f"LLM generation failed: {e}")
             logger.debug(f"Raw response: {response_text}")

Comment on lines +21 to +24
def test_backend_client_initialization():
client = BackendLLMClient(backend_url="http://test-backend:3000")
assert client.backend_url == "http://test-backend:3000"
assert client.model == "llama-3.3-70b-versatile"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

test_backend_client_initialization does not mock load_credentials.

Unlike the other tests, this test doesn't use the mock_credentials fixture. BackendLLMClient.__init__ calls load_credentials(), which may fail or produce side effects (e.g., file I/O, missing config) when run in CI without a real credentials file.

Proposed fix
-def test_backend_client_initialization():
-    client = BackendLLMClient(backend_url="http://test-backend:3000")
+def test_backend_client_initialization(mock_credentials):
+    client = BackendLLMClient(backend_url="http://test-backend:3000")
     assert client.backend_url == "http://test-backend:3000"
     assert client.model == "llama-3.3-70b-versatile"
🤖 Prompt for AI Agents
In `@tests/test_backend_client.py` around lines 21 - 24, The test
test_backend_client_initialization calls BackendLLMClient.__init__, which
invokes load_credentials and can cause I/O/side effects; update the test to mock
or patch load_credentials (or use the existing mock_credentials fixture) so
__init__ receives a controlled credential return value instead of performing
real I/O—e.g., ensure load_credentials is stubbed before instantiating
BackendLLMClient so the test only asserts client.backend_url and client.model
without touching the filesystem.

Comment on lines +58 to +59
with pytest.raises(
RuntimeError, match="Backend LLM proxy error \(500\): Internal Server Error"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Invalid escape sequence \( — pipeline is failing (W605).

Use a raw string for the match pattern to avoid the invalid escape.

Proposed fix
     with pytest.raises(
-        RuntimeError, match="Backend LLM proxy error \(500\): Internal Server Error"
+        RuntimeError, match=r"Backend LLM proxy error \(500\): Internal Server Error"
     ):
🧰 Tools
🪛 GitHub Actions: Pre-commit

[warning] 59-59: flake8: W605 invalid escape sequence '('

🤖 Prompt for AI Agents
In `@tests/test_backend_client.py` around lines 58 - 59, The regex string passed
to pytest.raises uses an escaped parenthesis causing a W605 invalid escape
sequence; update the match argument in the pytest.raises call (the one that
asserts RuntimeError with message "Backend LLM proxy error (500): Internal
Server Error") to use a raw string literal (e.g., r"Backend LLM proxy error
\(500\): Internal Server Error") so the backslash is treated literally and the
invalid escape is avoided.

@@ -0,0 +1,130 @@
"""Tests for LLM Orchestrator."""

import json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused import json.

Flagged by the CI pipeline (flake8 F401).

Proposed fix
-import json
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import json
🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 3-3: flake8: F401 'json' imported but unused

🤖 Prompt for AI Agents
In `@tests/test_llm_orchestrator.py` at line 3, The file contains an unused
top-level import "json" (import json) which triggers flake8 F401; remove that
import line from tests/test_llm_orchestrator.py (or replace it with use of json
if intended) so the unused symbol "json" is no longer imported; ensure no other
references to "json" remain in functions or tests in this file such as any test
helpers or fixtures before committing.

assert suggestion.explanation == "Fixed the bug"
assert suggestion.model_name == "llama-3.3-70b-versatile"

# Verify prompt construction (implicity) by checking generate call
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo: "implicity" → "implicitly".

-        # Verify prompt construction (implicity) by checking generate call
+        # Verify prompt construction (implicitly) by checking generate call
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Verify prompt construction (implicity) by checking generate call
# Verify prompt construction (implicitly) by checking generate call
🤖 Prompt for AI Agents
In `@tests/test_llm_orchestrator.py` at line 72, Fix the typo in the inline
comment inside the test that currently reads "# Verify prompt construction
(implicity) by checking generate call"—change "implicity" to "implicitly" so the
comment reads "# Verify prompt construction (implicitly) by checking generate
call"; update this comment in tests/test_llm_orchestrator.py where the
prompt-construction verification is implemented (the test that asserts the
generate call).

Comment on lines +3 to +6
import sys
import tempfile
from pathlib import Path
from unittest.mock import MagicMock, Mock, create_autospec, patch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused imports sys and create_autospec — pipeline is failing.

Both sys and create_autospec are imported but never used, causing flake8 F401 errors in CI.

Proposed fix
-import sys
 import tempfile
 from pathlib import Path
-from unittest.mock import MagicMock, Mock, create_autospec, patch
+from unittest.mock import MagicMock, Mock, patch
🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 3-3: flake8: F401 'sys' imported but unused


[error] 6-6: flake8: F401 'unittest.mock.create_autospec' imported but unused

🤖 Prompt for AI Agents
In `@tests/test_rag_indexer.py` around lines 3 - 6, Remove the unused imports
`sys` and `create_autospec` from the import block (the top-level import
statement that currently lists sys, tempfile, Path, MagicMock, Mock,
create_autospec, patch) so only used symbols remain; update the import line to
import tempfile, Path, MagicMock, Mock, and patch and run the tests/flake8 to
confirm the F401 failures are resolved.

…printing

- Implemented robust tree-sitter Language initialization to handle multiple API versions
- Added ast.dump fallback for ast.unparse to fix code anonymization on Python 3.8
- Addresses 18 RAG errors and 2 pattern fingerprinting failures in CI
@omsherikar omsherikar merged commit 0107648 into main Feb 8, 2026
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies documentation Improvements or additions to documentation enhancement New feature or request refactoring security size: x-large testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants