Skip to content

Conversation

@rostilos
Copy link
Owner

@rostilos rostilos commented Jan 20, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added delta indexing for efficient branch-specific RAG queries
    • Implemented hybrid query support for improved pull request context retrieval
    • Added event-driven system for analysis and RAG operation tracking
    • Added repository branch listing and diff retrieval capabilities
    • Introduced mainBranch configuration with backward compatibility for defaultBranch
  • Chores

    • Reorganized internal package structure for better modularity
    • Database migration added for delta index persistence

✏️ Tip: You can customize this high-level summary in your review settings.

rostilos and others added 3 commits January 18, 2026 17:28
- Implemented GitLabBranchWebhookHandler to handle push events for branch analysis.
- Created GitLabMergeRequestWebhookHandler to manage merge request events and trigger analysis.
- Added GitLabMrMergeWebhookHandler for handling merge events and updating issue status.
- Introduced GitLabWebhookParser to parse various GitLab webhook payloads.
- Updated ProjectController to return CommentCommandsConfig directly.
- Refactored ProjectService to utilize new configuration models for branch analysis and comment commands.
@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

Important

Review skipped

Too many files!

109 files out of 259 files are above the max files limit of 150.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This pull request introduces a hierarchical Retrieval-Augmented Generation (RAG) system with delta indexing, event-driven architecture, and modular configuration. It adds new analysis-api and events modules, refactors project configuration types, implements delta index persistence and management, restructures pipeline-agent package organization, and extends VCS clients with branch listing and diff retrieval capabilities.

Changes

Cohort / File(s) Summary
Analysis API Module Setup
java-ecosystem/libs/analysis-api/.gitignore, java-ecosystem/libs/analysis-api/pom.xml, java-ecosystem/libs/analysis-api/src/main/java/module-info.java
Establishes new public analysis-api module with Java 17 compiler, core/slf4j dependencies, and module declarations. .gitignore patterns added for build outputs and IDE artifacts.
RAG Operations Service Interface
java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java
Defines public interface for RAG operations with methods for capability queries, incremental updates, delta index lifecycle management, hybrid RAG decisions, and PR-target delta orchestration. Default implementations provided for extensibility.
Analysis Engine Integration
java-ecosystem/libs/analysis-engine/pom.xml, java-ecosystem/libs/analysis-engine/src/main/java/module-info.java, java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/BranchAnalysisProcessor.java, java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java
Adds analysis-api and events dependencies. Branch and PR processors inject ApplicationEventPublisher and RagOperationsService, publishing AnalysisStartedEvent/AnalysisCompletedEvent with correlationId and metrics. Includes event-guarded delta index triggers.
Deprecated RAG Service Shim
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/service/rag/RagOperationsService.java
Marks old RagOperationsService as @Deprecated and re-exports analysis-api version for backward compatibility.
Events Module
java-ecosystem/libs/events/.gitignore, java-ecosystem/libs/events/pom.xml, java-ecosystem/libs/events/src/main/java/module-info.java
Establishes events module with Spring Context dependency, exports 4 event packages (analysis, rag, project, core).
Event Classes
java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/CodecrowEvent.java, java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/analysis/..., java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/rag/..., java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/project/...
Adds abstract CodecrowEvent base with eventId, timestamp, correlationId. Concrete events: AnalysisStartedEvent (AnalysisType enum), AnalysisCompletedEvent (CompletionStatus enum), RagIndexStartedEvent (IndexType, IndexOperation enums), RagIndexCompletedEvent, ProjectConfigChangedEvent.
Core Model: Configuration Extraction
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/BranchAnalysisConfig.java, java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/CommandAuthorizationMode.java, java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/CommentCommandsConfig.java, java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/InstallationMethod.java, java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/RagConfig.java
Extracts 5 previously nested configuration types from ProjectConfig into standalone public classes. RagConfig includes delta fields and branch pattern matching. CommentCommandsConfig includes rate limiting and authorization helpers.
Core Model: ProjectConfig Refactoring
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/ProjectConfig.java
Introduces mainBranch field, deprecates defaultBranch with fallback. Removes nested type definitions. Adds setMainBranch with delta config sync and ensureMainBranchInPatterns() for pattern enforcement. Equals/hashCode/toString use mainBranch.
Delta Index Entity & Repository
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/rag/DeltaIndexStatus.java, java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/rag/RagDeltaIndex.java, java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/rag/RagDeltaIndexRepository.java
Adds DeltaIndexStatus enum (CREATING, READY, STALE, ARCHIVED, FAILED). RagDeltaIndex JPA entity persists delta index metadata with unique (project_id, branch_name) constraint. Repository provides lifecycle queries: readiness checks, stale detection, archive handling, count aggregations, bulk status mutations.
Module Exports & DB Migration
java-ecosystem/libs/core/src/main/java/module-info.java, java-ecosystem/libs/core/src/main/resources/db/migration/1.2.0/V1.2.0__add_rag_delta_index_table.sql
Exports RAG model/repository packages and opens to Spring/Hibernate. Migration creates rag_delta_index table with status/branch/project indexes and cascade delete.
Core DTO Updates
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/dto/project/ProjectDTO.java
Adds mainBranch and deprecated defaultBranch. RagConfigDTO extended with deltaEnabled and deltaRetentionDays with backward-compatible constructor. CommentCommandsConfigDTO.fromConfig signature updated.
Repository Extensions
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/pullrequest/PullRequestRepository.java
Adds findByProject_IdOrderByPrNumberDesc for descending PR retrieval.
Removed Repository
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/repository/QualityGateRepository.java
Deletes entire QualityGateRepository interface (36 lines removed).
Parent POM Updates
java-ecosystem/pom.xml
Adds codecrow-analysis-api and codecrow-events to dependencyManagement; registers libs/analysis-api and libs/events modules.
RAG Engine Enhancements
java-ecosystem/libs/rag-engine/pom.xml, java-ecosystem/libs/rag-engine/src/main/java/module-info.java
Adds analysis-api and events dependencies and module requires.
RAG Pipeline Client Extensions
java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/client/RagPipelineClient.java
Adds delta index operations (create/update/delete/exists) and hybrid PR context endpoints. Introduces generic doRequest(url, payload, client, method) and put() method. Respects ragEnabled flag with fallback defaults.
Delta Index Service
java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/DeltaIndexService.java
New service wrapping RagPipelineClient with four public methods: createOrUpdateDeltaIndex, updateDeltaIndex, deleteDeltaIndex, deltaIndexExists. Normalizes responses and handles IOExceptions.
RAG Operations Implementation
java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/RagOperationsServiceImpl.java
Implements delta index lifecycle methods (isDeltaIndexReady, createOrUpdateDeltaIndex, ensureDeltaIndexForPrTarget, ensureRagIndexUpToDate) with VCS integration, RagDeltaIndexRepository persistence, and event signaling.
VCS Client Enhancements
java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/VcsClient.java, java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/bitbucket/cloud/BitbucketCloudClient.java, java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/github/GitHubClient.java, java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/gitlab/GitLabClient.java
Adds listBranches() and getBranchDiff(baseBranch, compareBranch) methods. Implementations handle pagination, branch name collection, and unified diff format retrieval from respective VCS APIs.
Pipeline Agent: Package Restructuring
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/..., java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/..., java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/..., java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/...
Reorganizes 30+ files: moves from flat pipelineagent.{controller,service,processor,webhookhandler} to nested generic/{controller,service,processor,webhookhandler}, {bitbucket,github,gitlab}/{service,webhookhandler}. Imports updated to reference new paths and extracted config types.
Web Server: Integration & Analysis
java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/integration/controller/VcsIntegrationController.java, java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/integration/service/VcsIntegrationService.java, java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/PullRequestController.java
Adds listBranches() endpoint and service method. Integrates VCS branch/diff APIs. Updates PullRequestRepository calls to use descending order.
Web Server: Project Management
java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/controller/ProjectController.java, java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/service/ProjectService.java, java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/request/..., java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/response/RagDeltaIndexDTO.java
Migrates from ProjectConfig nested types to extracted types. Adds mainBranch field with fallback to defaultBranch in DTOs. Extends UpdateRagConfigRequest with deltaEnabled/deltaRetentionDays. Adds RagDeltaIndexDTO mapping. Updates ProjectService creation/update flows to use mainBranch and call ensureMainBranchInPatterns().
Python RAG Pipeline: Delta Indexing
python-ecosystem/rag-pipeline/src/rag_pipeline/core/delta_index_manager.py
New 681-line module implementing DeltaIndexManager with DeltaIndexStatus enum and DeltaIndexStats dataclass. Supports creating/updating/deleting delta indexes, diff computation via git/parsing, batch vectorization with Qdrant, and metadata enrichment.
Python RAG Pipeline: Hybrid Retrieval
python-ecosystem/rag-pipeline/src/rag_pipeline/services/hybrid_query_service.py
New 482-line module implementing HybridQueryService with HybridQueryResult dataclass. Orchestrates base + delta retrieval, query decomposition, result merging with delta boost, and priority-based re-ranking. Supports fallback when thresholds strict.
Python RAG Pipeline: Query Fallback
python-ecosystem/rag-pipeline/src/rag_pipeline/services/query_service.py
Adds _get_fallback_branch() helper and fallback logic in get_context_for_pr. Tracks fallback usage and embeds branch metadata in results.
Python RAG Pipeline: API Integration
python-ecosystem/rag-pipeline/src/rag_pipeline/api/api.py
Bumps version 1.0.0→1.1.0. Instantiates DeltaIndexManager and HybridQueryService. Adds 10 new endpoints: delta lifecycle (POST/PUT/GET/DELETE/list/exists), hybrid PR context (POST), should-use-hybrid (GET), enhanced /system/gc. Introduces request/response models for delta and hybrid operations.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • PR #59: Modifies the same RAG delta/hybrid indexing subsystems, event infrastructure, RagOperationsService wiring, and VCS diff APIs at the code level.
  • PR #20: Updates VCS diff handling endpoints and fallback logic in GitHub/Bitbucket clients that overlap with the diff retrieval enhancements in this PR.

Poem

🐰 With delta branches indexed fast and hybrid queries set,
Hierarchical RAG takes flight, no better system yet!
Events cascade through modules clean, config types stand tall,
Package roads reorganize—a rabbit's dream to call. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title "1.2.0 rc" is vague and generic, using a version number without conveying meaningful information about the changeset's primary purpose or scope. Replace with a descriptive title that summarizes the main feature or change, such as 'Add delta RAG indexes and hybrid retrieval support' or 'Introduce hierarchical RAG system with delta indexes'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/ReviewCommandProcessor.java (2)

111-136: Error messages may be posted as reviews.

When buildReviewRequest returns null or exceptions occur, this method returns user-facing error strings (e.g., "⚠️ **Review Failed**..."). These pass the review.isBlank() check in process() and will be posted as if they were successful reviews.

Consider returning null or throwing an exception for errors, letting process() handle all error responses uniformly.

🐛 Suggested fix
     private String generateReview(Project project, WebhookPayload payload) {
         try {
             ReviewRequest request = buildReviewRequest(project, payload);
             
             if (request == null) {
                 log.warn("Failed to build review request - missing AI or VCS configuration");
-                return "⚠️ **Review Failed**\n\nCould not generate review - missing AI or VCS configuration.";
+                return null;
             }
             
             log.info("Calling AI service to generate code review...");
             
             ReviewResult result = aiCommandClient.review(request, event -> {
                 log.debug("AI review event: {}", event);
             });
             
             log.info("Code review generated successfully");
             return result.review();
             
         } catch (IOException e) {
             log.error("Failed to generate review via AI: {}", e.getMessage(), e);
-            return "⚠️ **Review Failed**\n\nError generating code review: " + e.getMessage();
+            return null;
         } catch (Exception e) {
             log.error("Unexpected error generating review: {}", e.getMessage(), e);
-            return "⚠️ **Review Failed**\n\nUnexpected error: " + e.getMessage();
+            return null;
         }
     }

163-165: Potential NumberFormatException when parsing pullRequestId.

If payload.pullRequestId() is non-null but contains a non-numeric string, Long.parseLong() will throw NumberFormatException. This exception propagates to generateReview() where it's caught by the generic Exception handler, but adding explicit handling would provide a clearer error message.

🐛 Suggested fix
-            Long prId = payload.pullRequestId() != null 
-                ? Long.parseLong(payload.pullRequestId()) 
-                : null;
+            Long prId = null;
+            if (payload.pullRequestId() != null) {
+                try {
+                    prId = Long.parseLong(payload.pullRequestId());
+                } catch (NumberFormatException e) {
+                    log.warn("Invalid PR ID format: {}", payload.pullRequestId());
+                    return null;
+                }
+            }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java (1)

76-78: Add null check before calling payload.rawPayload().has("action").

Line 76-78 calls payload.rawPayload() without a null check. Evidence from ProviderWebhookController (line 311) shows that payload.rawPayload() can return null, yet this code immediately invokes .has("action") on the result, causing a potential NullPointerException. Add a null check or use safe navigation:

String action = payload.rawPayload() != null && payload.rawPayload().has("action") 
        ? payload.rawPayload().get("action").asText() 
        : null;
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/service/BitbucketReportingService.java (1)

203-205: Log the full exception, not just the message.

Catching Exception broadly and logging only e.getMessage() loses the stack trace, making debugging harder if unexpected errors occur.

🔧 Suggested fix
         } catch (Exception e) {
-            log.warn("Failed to post detailed issues as reply, will be included in annotations: {}", e.getMessage());
+            log.warn("Failed to post detailed issues as reply, will be included in annotations", e);
         }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/webhookhandler/BitbucketCloudWebhookParser.java (1)

51-54: Potential issue: asInt() returns 0 for missing/invalid nodes.

If the pullrequest object exists but the id field is missing or invalid, asInt() returns 0, resulting in pullRequestId = "0". This could cause issues if "0" is treated as a valid PR ID downstream.

🔧 Suggested defensive check
         JsonNode pullRequest = payload.path("pullrequest");
         if (!pullRequest.isMissingNode()) {
-            pullRequestId = String.valueOf(pullRequest.path("id").asInt());
+            JsonNode prIdNode = pullRequest.path("id");
+            if (!prIdNode.isMissingNode() && prIdNode.isNumber()) {
+                pullRequestId = String.valueOf(prIdNode.asInt());
+            }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/controller/ProviderWebhookController.java (1)

336-345: Generic job is not persisted to the database.

The job is instantiated with new Job() but never saved through jobService or jobRepository. When passed to webhookAsyncProcessor, the unsaved job will lack an ID and external ID, causing job tracking to fail.

The suggested fix requires creating a createGenericJob method in JobService (which doesn't currently exist). Consider adding one following the pattern of other creation methods:

🐛 Suggested approach

Add a method to JobService:

`@Transactional`
public Job createGenericJob(
        Project project,
        String title,
        JobTriggerSource triggerSource
) {
    Job job = new Job();
    job.setProject(project);
    job.setJobType(JobType.MANUAL_ANALYSIS);
    job.setTriggerSource(triggerSource);
    job.setTitle(title);
    job.setStatus(JobStatus.PENDING);
    
    job = jobRepository.save(job);
    addLog(job, JobLogLevel.INFO, "init", "Generic job created");
    
    return job;
}

Then in ProviderWebhookController line 336-345:

-            Job job = new Job();
-            job.setProject(project);
-            job.setJobType(JobType.MANUAL_ANALYSIS);
-            job.setTriggerSource(JobTriggerSource.WEBHOOK);
-            job.setTitle("Webhook: " + payload.eventType());
-            job.setStatus(JobStatus.PENDING);
-            // Save through service would be better, but for now:
-            return job;
+            return jobService.createGenericJob(
+                    project,
+                    "Webhook: " + payload.eventType(),
+                    JobTriggerSource.WEBHOOK
+            );
java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java (1)

196-211: Missing completion event for non-IOException exceptions.

The AnalysisCompletedEvent is only published for IOException. If other exceptions occur (e.g., GeneralSecurityException propagating from VCS services, or RuntimeException from createOrUpdatePullRequest), no completion event is published.

Consider wrapping the try-catch more broadly or adding a generic exception handler:

🔧 Suggested approach
         } catch (IOException e) {
             log.error("IOException during PR analysis: {}", e.getMessage(), e);
             // ... existing handling ...
             return Map.of("status", "error", "message", e.getMessage());
+        } catch (Exception e) {
+            log.error("Unexpected error during PR analysis: {}", e.getMessage(), e);
+            publishAnalysisCompletedEvent(project, request, correlationId, startTime,
+                    AnalysisCompletedEvent.CompletionStatus.FAILED, 0, 0, e.getMessage());
+            throw e;
         } finally {
             analysisLockService.releaseLock(lockKey.get());
         }
🤖 Fix all issues with AI agents
In `@java-ecosystem/libs/analysis-api/.gitignore`:
- Around line 38-40: Remove the stray "index.ts" entry from the analysis-api
module's .gitignore because this is a Java/Maven module and that TypeScript
entry is likely copy-pasted; keep the `.env` and `server.log` entries intact and
only delete the "index.ts" line in the .gitignore for the analysis-api module.

In
`@java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java`:
- Around line 122-135: The default method createOrUpdateDeltaIndex currently
calls eventConsumer.accept(...) unguarded which can NPE if callers pass null;
update the method to guard eventConsumer (e.g., if (eventConsumer != null)
eventConsumer.accept(...), or use Optional.ofNullable(eventConsumer).ifPresent(c
-> c.accept(...))) so the default implementation remains a safe no-op when
eventConsumer is null; reference the eventConsumer parameter in the
createOrUpdateDeltaIndex method and apply the null-check before invoking accept.
- Around line 144-188: The comparison baseBranch.equals(targetBranch) in
shouldUseHybridRag can NPE if getBaseBranch() returns null; change it to a
null-safe comparison (e.g., use Objects.equals(baseBranch, targetBranch) or
explicitly check baseBranch == null before calling equals) so the logic safely
handles a null baseBranch and proceeds to the delta checks; keep the rest of the
method and HybridRagDecision construction the same so null baseBranch is
preserved in the decision when appropriate.

In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst`:
- Around line 1-3: Remove the generated build artifacts (e.g., createdFiles.lst
entries and the class files RagOperationsService.class,
RagOperationsService$HybridRagDecision.class, module-info.class) from the PR and
ensure they are not tracked: delete them from the index (git rm --cached or
remove and recommit) and add a rule to ignore build outputs (e.g., add target/
to .gitignore). Re-run the build to confirm these files are untracked, then
commit the updated .gitignore and the removal changes.

In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst`:
- Around line 1-2: Delete the generated Maven build artifact inputFiles.lst from
the repo and remove any other files under the target/maven-status tree; then add
a .gitignore entry to exclude target/ (or at minimum target/maven-status/** and
inputFiles.lst) so future build outputs aren’t committed. Commit the deletion
and the updated .gitignore together with a concise commit message like "chore:
remove generated target artifacts and ignore target/".

In `@java-ecosystem/libs/events/.gitignore`:
- Around line 38-40: Remove the stray index.ts entry from the .gitignore in the
events Java module: open the .gitignore file in java-ecosystem/libs/events,
delete the line containing "index.ts" (which is likely a copy-paste artifact),
and commit the change so the ignore file only contains Java-relevant entries
like .env and server.log.

In
`@java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/client/RagPipelineClient.java`:
- Around line 252-269: The URL path segments are not encoded; update
deleteDeltaIndex (and similarly deltaIndexExists, shouldUseHybrid, deleteIndex)
to properly escape workspace, project, and branch/deltaBranch before building
the request—either by using OkHttp's HttpUrl.Builder.addPathSegment(...) for
each segment (recommended) or by applying URLEncoder.encode(...,
StandardCharsets.UTF_8) to each segment; ensure the final Request.Builder uses
the encoded/constructed HttpUrl so special characters (like slashes) in branch
names do not break the path.

In
`@java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/RagOperationsServiceImpl.java`:
- Around line 589-594: The log construction in RagOperationsServiceImpl (inside
the code that calls eventConsumer.accept with Map.of and formats "Updating RAG
index from %s to %s") uses indexedCommit.substring(0, 7) and
currentCommit.substring(0, 7) which can throw StringIndexOutOfBoundsException
for short commit hashes; update the formatting to safely take the first up to 7
characters (e.g., compute a safe prefix using Math.min(commit.length(), 7) or a
helper like safePrefix(commit, 7)) for both indexedCommit and currentCommit
before calling eventConsumer.accept so the message never attempts to substring
past the string length.

In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/github/GitHubClient.java`:
- Around line 487-507: The code prepends workspaceId to repoIdOrSlug even when
repoIdOrSlug is already a full "owner/repo", causing malformed paths; update the
branch-resolution logic in listBranches (and mirror the same change in
getBranchDiff) so that after the numeric-ID check you: 1) detect if repoIdOrSlug
already contains a '/' (or starts with '/') and use it as repoFullName (trim any
leading slash), 2) otherwise build repoFullName as workspaceId + "/" +
repoIdOrSlug; keep the numeric-ID branch that resolves full_name unchanged but
ensure trimming/validation so repoFullName is always a proper owner/repo string
before calling the GitHub branches API.

In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/CommentCommandRateLimitService.java`:
- Around line 45-47: The code assumes config.getCommentCommandsConfig() is
non-null in CommentCommandRateLimitService (used to read rateLimit and
windowMinutes and again later); guard these accesses by checking if
config.getCommentCommandsConfig() == null and handle gracefully—either
return/disable rate limiting early or apply safe defaults (e.g., rateLimit = 0
or configured global defaults) before using
getEffectiveRateLimit()/getEffectiveRateLimitWindowMinutes(); update the places
that currently call getCommentCommandsConfig() (the initialization where
rateLimit/windowMinutes are read and the later usage around the second block) to
fetch the config into a local variable, null-check it, and proceed only when
non-null or use the chosen defaults so command handling is not blocked by a
missing CommentCommandsConfig.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/controller/ProjectController.java`:
- Around line 538-542: ProjectController currently parses
request.installationMethod() using InstallationMethod.valueOf which is
case-sensitive; normalize the incoming string (e.g., call
toUpperCase(Locale.ROOT) on request.installationMethod()) before passing to
InstallationMethod.valueOf to accept lowercase inputs and avoid
IllegalArgumentException due to case mismatches, still keeping the existing
try/catch around InstallationMethod.valueOf and assigning the result to the
installationMethod variable.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/request/UpdateProjectRequest.java`:
- Around line 35-37: UpdateProjectRequest.getMainBranch currently only falls
back to defaultBranch when mainBranch is null but not when it's an empty or
whitespace string; change the logic to treat blank values as absent by returning
defaultBranch when mainBranch is null or blank (e.g., use String.isBlank() or
trim().isEmpty()), ensuring callers of getMainBranch() never receive
empty/whitespace branch names.

In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/service/ProjectService.java`:
- Around line 281-291: The update path currently accepts request.getMainBranch()
even when empty; change the guard in ProjectService (the updateProject flow
around request.getMainBranch()) to only proceed when mainBranch is non-null and
not blank (e.g., trim-empty), and otherwise skip updating the ProjectConfig (or
reject the request); update the same block that calls
ProjectConfig.ensureMainBranchInPatterns() and project.setConfiguration(cfg) so
you never store an empty main branch or inject invalid patterns.

In `@python-ecosystem/rag-pipeline/src/rag_pipeline/api/api.py`:
- Line 16: The root endpoint returns a hardcoded "1.0.0" while the FastAPI app
is created with version "1.1.0"; update the root endpoint implementation (the
function handling "/") to return the app.version value or to match "1.1.0" so
they stay consistent (refer to the FastAPI instance declared as app =
FastAPI(...)); prefer using app.version in the root response to prevent future
drift.

In `@python-ecosystem/rag-pipeline/src/rag_pipeline/core/delta_index_manager.py`:
- Around line 340-360: The loader is being given absolute filesystem paths so
metadata["path"] becomes absolute and later deletions/dedupe (which use
repo-relative changed_files) fail; instead, keep the existence check using
full_path (repo_path_obj / f) but pass repo-relative Path/str to
DocumentLoader.load_specific_files. Concretely, in the block that builds
file_paths and calls load_specific_files, change file_paths to contain the
original repo-relative entries (e.g., f or Path(f)) after verifying
full_path.exists() and full_path.is_file(), and update the other similar section
(the repeated block around the 503-549 area) the same way so load_specific_files
receives repo-relative paths while existence checks use full_path.

In
`@python-ecosystem/rag-pipeline/src/rag_pipeline/services/hybrid_query_service.py`:
- Around line 297-452: The top_k parameter of get_hybrid_context_for_pr is never
used, so callers can’t control result breadth; update the retrieval calls so
per-query q_top_k is bounded by the function-level top_k: when iterating queries
(the queries variable produced by _decompose_queries), replace uses of q_top_k
passed into _query_collection with min(q_top_k, top_k) for base queries and for
delta queries use min(max(3, q_top_k // 2), top_k) (or otherwise ensure delta
never requests more than top_k), so _query_collection respects the top_k
argument; you can alternatively pass top_k into _decompose_queries if that
function should produce sized queries, but ensure get_hybrid_context_for_pr
enforces the top_k cap when calling _query_collection.
🧹 Nitpick comments (32)
java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/integration/dto/request/RepoOnboardRequest.java (1)

109-134: Keep legacy setter in sync with mainBranch to avoid divergent state.
If callers still use setDefaultBranch, mainBranch stays null. Consider syncing when mainBranch hasn’t been explicitly set.

♻️ Proposed refactor
 `@Deprecated`
 public void setDefaultBranch(String defaultBranch) {
     this.defaultBranch = defaultBranch;
+    if (this.mainBranch == null) {
+        this.mainBranch = defaultBranch;
+    }
 }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/ReviewCommandProcessor.java (3)

39-40: Consider platform-specific comment length limits.

The MAX_REVIEW_LENGTH of 65,000 may exceed comment limits on some VCS platforms (e.g., Bitbucket's limit is approximately 32,768 characters). Consider making this configurable per VCS provider or using the lowest common denominator.


46-53: Consider injecting VcsConnectionCredentialsExtractor for better testability.

Instantiating VcsConnectionCredentialsExtractor internally couples this class to its implementation. For easier unit testing, consider injecting it as a dependency or providing a package-private constructor for tests.

♻️ Suggested refactor
 public ReviewCommandProcessor(
         AiCommandClient aiCommandClient,
-        TokenEncryptionService tokenEncryptionService
+        TokenEncryptionService tokenEncryptionService,
+        VcsConnectionCredentialsExtractor credentialsExtractor
 ) {
     this.aiCommandClient = aiCommandClient;
     this.tokenEncryptionService = tokenEncryptionService;
-    this.credentialsExtractor = new VcsConnectionCredentialsExtractor(tokenEncryptionService);
+    this.credentialsExtractor = credentialsExtractor;
 }

102-105: Consider sanitizing exception messages in error responses.

Directly including e.getMessage() in the error response could potentially expose internal details (e.g., connection strings, file paths, stack traces from nested causes) to end users. Consider using a generic error message while logging the full details.

♻️ Suggested approach
         } catch (Exception e) {
             log.error("Error generating review: {}", e.getMessage(), e);
-            return WebhookResult.error("Failed to generate code review: " + e.getMessage());
+            return WebhookResult.error("Failed to generate code review. Please check the logs for details.");
         }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/service/GitHubAiClientService.java (1)

87-92: Consider explicit case handling for safety.

The default case silently treats any non-BRANCH_ANALYSIS type as a PR analysis. If new AnalysisType values are introduced, this could lead to unexpected ClassCastException at runtime.

💡 Suggested defensive approach
 switch (request.getAnalysisType()) {
     case BRANCH_ANALYSIS:
         return buildBranchAnalysisRequest(project, (BranchProcessRequest) request, previousAnalysis);
+    case PR_ANALYSIS:
+    case INCREMENTAL_PR_ANALYSIS:
         return buildPrAnalysisRequest(project, (PrProcessRequest) request, previousAnalysis);
+    default:
+        throw new IllegalArgumentException("Unsupported analysis type: " + request.getAnalysisType());
 }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/service/GitHubReportingService.java (2)

179-184: Consider defensive parsing for comment IDs.

Long.parseLong(placeholderCommentId) will throw NumberFormatException if the input is malformed. While callers should provide valid IDs, consider wrapping in a try-catch or using a utility method for consistency across lines 182, 322, and 348.

♻️ Suggested approach
private long parseCommentId(String commentId) {
    try {
        return Long.parseLong(commentId);
    } catch (NumberFormatException e) {
        throw new IllegalArgumentException("Invalid comment ID: " + commentId, e);
    }
}

Then use parseCommentId(placeholderCommentId) instead of Long.parseLong(...).


353-359: Acknowledged: Mermaid diagrams disabled pending validation.

The TODO explains the rationale well. When you're ready to re-enable Mermaid support, consider adding a validation step using a Mermaid parser/linter to catch syntax errors before posting, or implement a fallback to ASCII when validation fails.

Would you like me to open an issue to track implementing Mermaid syntax validation for GitHub comments?

java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java (1)

121-131: Consider parsing pullRequestId once.

Long.parseLong(payload.pullRequestId()) is called three times in this method (lines 121, 126, 162). Extract to a local variable at the method entry to avoid redundant parsing.

♻️ Suggested refactor
     private WebhookResult handlePullRequestEvent(
             WebhookPayload payload,
             Project project,
             Consumer<Map<String, Object>> eventConsumer
     ) {
         String placeholderCommentId = null;
+        Long pullRequestId = Long.parseLong(payload.pullRequestId());
         
         try {
             // Post placeholder comment immediately to show analysis has started
-            placeholderCommentId = postPlaceholderComment(project, Long.parseLong(payload.pullRequestId()));
+            placeholderCommentId = postPlaceholderComment(project, pullRequestId);
             
             // Convert WebhookPayload to PrProcessRequest
             PrProcessRequest request = new PrProcessRequest();
             request.projectId = project.getId();
-            request.pullRequestId = Long.parseLong(payload.pullRequestId());
+            request.pullRequestId = pullRequestId;

And at line 162:

-                    updatePlaceholderWithError(project, Long.parseLong(payload.pullRequestId()), placeholderCommentId, e.getMessage());
+                    updatePlaceholderWithError(project, pullRequestId, placeholderCommentId, e.getMessage());
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/service/GitLabAiClientService.java (1)

87-92: Consider handling unknown analysis types explicitly.

The default case assumes any non-BRANCH_ANALYSIS type is a PR/MR request. If a new analysis type is added to the enum in the future, this would result in a ClassCastException at runtime.

♻️ Suggested improvement
 switch (request.getAnalysisType()) {
     case BRANCH_ANALYSIS:
         return buildBranchAnalysisRequest(project, (BranchProcessRequest) request, previousAnalysis);
-    default:
+    case MERGE_REQUEST_ANALYSIS:
+    case PULL_REQUEST_ANALYSIS:
         return buildMrAnalysisRequest(project, (PrProcessRequest) request, previousAnalysis);
+    default:
+        throw new IllegalArgumentException("Unsupported analysis type: " + request.getAnalysisType());
 }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/dto/webhook/WebhookPayload.java (1)

172-181: Consider caching the parsed command to avoid redundant parsing.

hasCodecrowCommand() and getCodecrowCommand() both call parseCommand(), which involves string operations. If both methods are called in sequence (common pattern: check then get), the command is parsed twice.

♻️ Suggested optimization

Since records are immutable, you could either:

  1. Accept the minor overhead (parsing is lightweight)
  2. Have callers use getCodecrowCommand() directly and null-check:
CodecrowCommand cmd = payload.getCodecrowCommand();
if (cmd != null) {
    // use cmd
}
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/webhookhandler/BitbucketCloudWebhookParser.java (1)

120-122: Same issue with asLong() for comment IDs.

Similar to the PR ID issue, asLong() returns 0 for missing nodes. Consider adding a defensive check here as well, though this is lower risk since comment IDs are used for threading rather than entity lookup.

java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/PullRequestController.java (1)

82-94: Grouped map may not preserve order within each group.

The comment claims "maintaining order within each group", but Collectors.groupingBy with default downstream collectors uses a HashMap which doesn't guarantee iteration order. The input is ordered by prNumberDesc, but the resulting List within each group depends on encounter order, which should be preserved by Collectors.toList().

However, the grouped variable is typed as Map<String, List<PullRequestDTO>>, and the actual map implementation from groupingBy is HashMap. If you need guaranteed order of groups (by branch name) as well, consider using LinkedHashMap:

♻️ Optional: Use LinkedHashMap to preserve group encounter order
         Map<String, List<PullRequestDTO>> grouped = pullRequestList.stream()
                 .collect(Collectors.groupingBy(
                         pr -> pr.getTargetBranchName() == null ? "unknown" : pr.getTargetBranchName(),
+                        LinkedHashMap::new,
                         Collectors.mapping(pr -> {
                             CodeAnalysis analysis = codeAnalysisRepository
                                     .findByProjectIdAndPrNumberWithMaxPrVersion(project.getId(), pr.getPrNumber())
                                     .orElse(null);
                             return PullRequestDTO.fromPullRequestWithAnalysis(pr, analysis);
                         }, Collectors.toList())
                 ));
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/PipelineJobService.java (2)

324-335: Redundant pattern in failure detection.

The check for "failed to" already covers the case of "failed to parse", making the second condition redundant.

♻️ Remove redundant check
     private boolean isFailureComment(String comment) {
         if (comment == null) return false;
         String lower = comment.toLowerCase();
         return lower.contains("failed to") || 
-               lower.contains("failed to parse") ||
                lower.contains("unable to") ||
                lower.contains("cannot perform") ||
                lower.contains("error:") ||
                lower.contains("response parsing failed") ||
                lower.contains("agent returned intermediate tool results") ||
                lower.contains("agent reached its step limit");
     }

346-348: Redundant null check in log statement.

At line 347, job.getExternalId() is called in the log statement even though job was already confirmed to be null by the check at line 342. However, looking closer, this code path is only reached when job != null (early return at 342), so the ternary is unnecessary.

♻️ Simplify the log statement
     public void failJob(Job job, String errorMessage) {
         if (job == null) return;
         
         try {
             jobService.failJob(job, errorMessage);
         } catch (Exception e) {
-            log.error("Error failing job {}", job != null ? job.getExternalId() : "null", e);
+            log.error("Error failing job {}", job.getExternalId(), e);
         }
     }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/controller/ProviderPipelineActionController.java (1)

176-186: Use a dedicated executor for CompletableFuture.supplyAsync() to avoid blocking the common ForkJoinPool.

CompletableFuture.supplyAsync() without an explicit executor uses the common ForkJoinPool, which is not designed for blocking I/O operations like webhook processing. This can lead to thread starvation if multiple requests queue up. Create and inject a dedicated ThreadPoolTaskExecutor bean configured for webhook processing tasks, and pass it to supplyAsync(supplier, executor). Consider reusing this pattern in other async operations like RagIndexingController (line 74).

java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/webhookhandler/WebhookHandler.java (1)

69-71: Consider importing HashMap instead of using fully qualified name.

Line 70 uses java.util.HashMap inline. For consistency with the import style in this file, consider adding an import statement.

♻️ Suggested fix

Add to imports:

import java.util.HashMap;

Then update line 70:

-            Map<String, Object> body = new java.util.HashMap<>(data);
+            Map<String, Object> body = new HashMap<>(data);
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/webhookhandler/WebhookHandlerFactory.java (1)

82-88: Consider importing ArrayList instead of using fully qualified name.

Line 83 uses java.util.ArrayList inline, similar to the pattern in WebhookHandler.java.

♻️ Suggested fix

Add to imports:

+import java.util.ArrayList;

Then update line 83:

-        List<WebhookHandler> result = new java.util.ArrayList<>(
+        List<WebhookHandler> result = new ArrayList<>(
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/webhookhandler/GitLabMergeRequestWebhookHandler.java (1)

123-128: Consider handling NumberFormatException explicitly for pullRequestId parsing.

Long.parseLong(payload.pullRequestId()) is called multiple times (lines 123, 128, 163). If pullRequestId is null or non-numeric, this throws NumberFormatException which gets caught by the generic exception handler, potentially masking the root cause.

Consider validating the ID early or providing a clearer error message.

♻️ Suggested improvement
     private WebhookResult handleMergeRequestEvent(
             WebhookPayload payload,
             Project project,
             Consumer<Map<String, Object>> eventConsumer
     ) {
         String placeholderCommentId = null;
+        Long mrId;
+        try {
+            mrId = Long.parseLong(payload.pullRequestId());
+        } catch (NumberFormatException e) {
+            log.error("Invalid MR ID format: {}", payload.pullRequestId());
+            return WebhookResult.error("Invalid merge request ID: " + payload.pullRequestId());
+        }
         
         try {
             // Post placeholder comment immediately to show analysis has started
-            placeholderCommentId = postPlaceholderComment(project, Long.parseLong(payload.pullRequestId()));
+            placeholderCommentId = postPlaceholderComment(project, mrId);
             
             // Convert WebhookPayload to PrProcessRequest
             PrProcessRequest request = new PrProcessRequest();
             request.projectId = project.getId();
-            request.pullRequestId = Long.parseLong(payload.pullRequestId());
+            request.pullRequestId = mrId;
java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/project/ProjectConfigChangedEvent.java (1)

26-36: Add correlationId constructor for event tracing parity.

Several event types in this module support correlation IDs; consider matching that pattern here for consistent tracing across event flows.

♻️ Proposed constructor overload
-    public ProjectConfigChangedEvent(Object source, Long projectId, String projectName,
-                                      ChangeType changeType, String changedField,
-                                      Object oldValue, Object newValue) {
-        super(source);
-        this.projectId = projectId;
-        this.projectName = projectName;
-        this.changeType = changeType;
-        this.changedField = changedField;
-        this.oldValue = oldValue;
-        this.newValue = newValue;
-    }
+    public ProjectConfigChangedEvent(Object source, Long projectId, String projectName,
+                                     ChangeType changeType, String changedField,
+                                     Object oldValue, Object newValue) {
+        this(source, null, projectId, projectName, changeType, changedField, oldValue, newValue);
+    }
+
+    public ProjectConfigChangedEvent(Object source, String correlationId, Long projectId, String projectName,
+                                     ChangeType changeType, String changedField,
+                                     Object oldValue, Object newValue) {
+        super(source, correlationId);
+        this.projectId = projectId;
+        this.projectName = projectName;
+        this.changeType = changeType;
+        this.changedField = changedField;
+        this.oldValue = oldValue;
+        this.newValue = newValue;
+    }
java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/analysis/AnalysisCompletedEvent.java (1)

27-42: Consider defensive copy for the metrics map to preserve immutability.

The metrics map is stored directly, allowing the caller to mutate event data after construction. For a proper immutable event, wrap it in an unmodifiable view.

♻️ Suggested fix
+import java.util.Collections;
+
 public AnalysisCompletedEvent(Object source, String correlationId, Long projectId, 
                                Long jobId, CompletionStatus status, Duration duration,
                                int issuesFound, int filesAnalyzed, String errorMessage,
                                Map<String, Object> metrics) {
     super(source, correlationId);
     this.projectId = projectId;
     this.jobId = jobId;
     this.status = status;
     this.duration = duration;
     this.issuesFound = issuesFound;
     this.filesAnalyzed = filesAnalyzed;
     this.errorMessage = errorMessage;
-    this.metrics = metrics;
+    this.metrics = metrics != null ? Collections.unmodifiableMap(metrics) : null;
 }
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/controller/ProviderWebhookController.java (2)

186-188: Security: Signature validation TODO should be addressed before release.

The webhook endpoint without token accepts all webhooks for configured repos without cryptographic validation. This is a security gap that allows spoofed webhooks to trigger analysis jobs.

Would you like me to help implement signature validation for GitHub (X-Hub-Signature-256) and GitLab (X-Gitlab-Token) webhooks? This should be prioritized before the 1.2.0 release.


309-312: Consider extracting PR merge detection into a helper method for readability.

The compound boolean expression spans multiple conditions and providers. Extracting this to a named method would improve clarity.

♻️ Suggested refactor
private boolean isPrMergeEvent(WebhookPayload payload) {
    String eventType = payload.eventType();
    if ("pullrequest:fulfilled".equals(eventType)) {
        return true; // Bitbucket
    }
    if ("pull_request.closed".equals(eventType) && payload.rawPayload() != null) {
        return payload.rawPayload().path("pull_request").path("merged").asBoolean(false); // GitHub
    }
    return false;
}
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/rag/RagDeltaIndexRepository.java (2)

27-28: Hardcoded status strings in JPQL queries are fragile.

Lines 27-28, 33-34, 38-40, 43-45, 55-56, and 58-60 use hardcoded 'READY', 'STALE', 'ARCHIVED' strings. If the DeltaIndexStatus enum values change, these queries will silently break. Consider using parameterized queries with the enum, similar to findByProjectIdAndStatus.

♻️ Example refactor for findReadyByProjectId
-@Query("SELECT d FROM RagDeltaIndex d WHERE d.project.id = :projectId AND d.status = 'READY'")
-List<RagDeltaIndex> findReadyByProjectId(`@Param`("projectId") Long projectId);
+@Query("SELECT d FROM RagDeltaIndex d WHERE d.project.id = :projectId AND d.status = :status")
+List<RagDeltaIndex> findReadyByProjectId(`@Param`("projectId") Long projectId, `@Param`("status") DeltaIndexStatus status);

Or use the derived query method pattern:

List<RagDeltaIndex> findByProjectIdAndStatus(Long projectId, DeltaIndexStatus status);

42-51: Consider adding @Transactional to @Modifying methods.

While Spring Data JPA often handles transactions automatically, explicitly annotating @Modifying methods with @Transactional ensures consistent behavior across different calling contexts.

`@Modifying`
`@Transactional`
`@Query`("UPDATE RagDeltaIndex d SET d.status = 'STALE', d.updatedAt = CURRENT_TIMESTAMP " +
       "WHERE d.project.id = :projectId AND d.baseBranch = :baseBranch AND d.status = 'READY'")
int markDeltasAsStale(`@Param`("projectId") Long projectId, `@Param`("baseBranch") String baseBranch);
java-ecosystem/libs/core/src/main/resources/db/migration/1.2.0/V1.2.0__add_rag_delta_index_table.sql (1)

24-28: Index on branch_name alone may have limited utility.

The index idx_rag_delta_branch on branch_name (line 27) without project_id will likely have low selectivity since branch names like main, master, develop are common across projects. Queries typically filter by project_id first.

Consider whether this index is needed, or replace with a more useful composite index.

python-ecosystem/rag-pipeline/src/rag_pipeline/api/api.py (2)

5-5: Avoid mutable default list for diff_snippets.

Use a default factory to avoid shared defaults across requests.

♻️ Proposed refactor
-from pydantic import BaseModel
+from pydantic import BaseModel, Field
@@
-    diff_snippets: Optional[List[str]] = []
+    diff_snippets: Optional[List[str]] = Field(default_factory=list)

Also applies to: 72-80


371-442: Use exception chaining and logger.exception in new endpoints.

Catching broad Exception without chaining drops stack context, and returning raw error text can leak internals. Also, Line 548 uses an f-string without placeholders.

🛠️ Suggested pattern
-    except Exception as e:
-        logger.error(f"Error creating delta index: {e}")
-        raise HTTPException(status_code=500, detail=str(e))
+    except Exception as e:
+        logger.exception("Error creating delta index")
+        raise HTTPException(status_code=500, detail="Internal server error") from e
@@
-            logger.warning(f"No base index available for hybrid query")
+            logger.warning("No base index available for hybrid query")
Apply the same pattern to the other new delta/hybrid endpoints for consistency.

Also applies to: 526-587

java-ecosystem/libs/events/src/main/java/module-info.java (1)

1-9: Module naming inconsistency.

The module name codecrow.events deviates from the established naming pattern used by other modules in this codebase (e.g., org.rostilos.codecrow.core, org.rostilos.codecrow.ragengine, org.rostilos.codecrow.analysisapi). Consider renaming to org.rostilos.codecrow.events for consistency.

♻️ Suggested fix
-module codecrow.events {
+module org.rostilos.codecrow.events {
     requires spring.context;
     requires org.rostilos.codecrow.core;

     exports org.rostilos.codecrow.events;
     exports org.rostilos.codecrow.events.analysis;
     exports org.rostilos.codecrow.events.rag;
     exports org.rostilos.codecrow.events.project;
 }

Note: If you rename the module, update the requires codecrow.events; statements in dependent modules (e.g., rag-engine/module-info.java at line 14) to requires org.rostilos.codecrow.events;.

java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/request/CreateProjectRequest.java (1)

29-36: Consider adding deprecation metadata for consistency.

The @Deprecated annotations lack the since and forRemoval attributes. Following the pattern used elsewhere in the codebase (e.g., ProjectVcsConnectionBinding uses @Deprecated(since = "2.0", forRemoval = true)), consider enriching the annotations:

♻️ Suggested improvement
     // optional main branch - the primary branch used as baseline for RAG indexing and analysis
     private String mainBranch;
     
     /**
      * `@deprecated` Use mainBranch instead
      */
-    `@Deprecated`
+    `@Deprecated`(since = "1.2.0", forRemoval = true)
     private String defaultBranch;

Apply the same pattern to the getter at line 88.

java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/RagConfig.java (1)

28-42: Consider consolidating constructor overloads using a builder or defaults.

The four constructors create maintenance overhead. As the record grows, consider using a builder pattern or reducing to the canonical constructor with null defaults.

java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/DeltaIndexService.java (1)

66-77: Consider using a more specific exception type instead of RuntimeException.

Wrapping IOException in a generic RuntimeException loses exception type information. A custom RagPipelineException or using UncheckedIOException would preserve more context for callers.

♻️ Optional improvement
-        } catch (IOException e) {
-            throw new RuntimeException("Failed to create delta index: " + e.getMessage(), e);
+        } catch (IOException e) {
+            throw new java.io.UncheckedIOException("Failed to create delta index: " + e.getMessage(), e);
         }
java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/RagOperationsServiceImpl.java (1)

358-383: Consider handling the case where delta index entity might not exist in error handler.

In the catch block, ragDeltaIndexRepository.findByProjectIdAndBranchName could return empty if the entity was never persisted (e.g., if an exception occurred before line 312). The ifPresent handles this, but the error path would silently skip updating the status.

This is minor since the job failure is still recorded, but logging when the entity isn't found would help debugging:

♻️ Optional improvement
             ragDeltaIndexRepository.findByProjectIdAndBranchName(project.getId(), deltaBranch)
-                    .ifPresent(di -> {
+                    .ifPresentOrElse(di -> {
                         di.setStatus(DeltaIndexStatus.FAILED);
                         di.setErrorMessage(e.getMessage());
                         di.setUpdatedAt(OffsetDateTime.now());
                         ragDeltaIndexRepository.save(di);
-                    });
+                    }, () -> log.debug("Delta index entity not found for failure update - may not have been persisted yet"));

Comment on lines +38 to +40
index.ts
.env
server.log No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

index.ts seems out of place in a Java module.

This is a Java Maven module (analysis-api), but index.ts is a TypeScript entry point file. This appears to be copy-pasted from a Node.js project template. Consider removing it if not applicable to this module.

.env and server.log are reasonable entries for any project.

🤖 Prompt for AI Agents
In `@java-ecosystem/libs/analysis-api/.gitignore` around lines 38 - 40, Remove the
stray "index.ts" entry from the analysis-api module's .gitignore because this is
a Java/Maven module and that TypeScript entry is likely copy-pasted; keep the
`.env` and `server.log` entries intact and only delete the "index.ts" line in
the .gitignore for the analysis-api module.

Comment on lines 122 to 135
default void createOrUpdateDeltaIndex(
Project project,
String deltaBranch,
String baseBranch,
String deltaCommit,
String rawDiff,
Consumer<Map<String, Object>> eventConsumer
) {
// Default implementation does nothing - override in actual implementation
eventConsumer.accept(Map.of(
"type", "warning",
"message", "Delta index operations not implemented"
));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard eventConsumer to avoid default-method NPEs.
Line 131 calls eventConsumer.accept(...) unconditionally. If callers pass null, this will throw even though the default impl is supposed to be a safe no-op.

🛠️ Suggested fix
-        eventConsumer.accept(Map.of(
-            "type", "warning",
-            "message", "Delta index operations not implemented"
-        ));
+        if (eventConsumer != null) {
+            eventConsumer.accept(Map.of(
+                "type", "warning",
+                "message", "Delta index operations not implemented"
+            ));
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
default void createOrUpdateDeltaIndex(
Project project,
String deltaBranch,
String baseBranch,
String deltaCommit,
String rawDiff,
Consumer<Map<String, Object>> eventConsumer
) {
// Default implementation does nothing - override in actual implementation
eventConsumer.accept(Map.of(
"type", "warning",
"message", "Delta index operations not implemented"
));
}
default void createOrUpdateDeltaIndex(
Project project,
String deltaBranch,
String baseBranch,
String deltaCommit,
String rawDiff,
Consumer<Map<String, Object>> eventConsumer
) {
// Default implementation does nothing - override in actual implementation
if (eventConsumer != null) {
eventConsumer.accept(Map.of(
"type", "warning",
"message", "Delta index operations not implemented"
));
}
}
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java`
around lines 122 - 135, The default method createOrUpdateDeltaIndex currently
calls eventConsumer.accept(...) unguarded which can NPE if callers pass null;
update the method to guard eventConsumer (e.g., if (eventConsumer != null)
eventConsumer.accept(...), or use Optional.ofNullable(eventConsumer).ifPresent(c
-> c.accept(...))) so the default implementation remains a safe no-op when
eventConsumer is null; reference the eventConsumer parameter in the
createOrUpdateDeltaIndex method and apply the null-check before invoking accept.

Comment on lines 144 to 188
default boolean isDeltaIndexReady(Project project, String branchName) {
return false;
}

/**
* Decision record for hybrid RAG usage.
*/
record HybridRagDecision(
boolean useHybrid,
String baseBranch,
String targetBranch,
boolean deltaAvailable,
String reason
) {}

/**
* Determine if hybrid RAG should be used for a PR.
*
* @param project The project
* @param targetBranch The PR target branch
* @return Decision about whether to use hybrid RAG
*/
default HybridRagDecision shouldUseHybridRag(Project project, String targetBranch) {
if (!isRagEnabled(project)) {
return new HybridRagDecision(false, null, targetBranch, false, "rag_disabled");
}

String baseBranch = getBaseBranch(project);

// If target is the base branch, no need for hybrid
if (baseBranch.equals(targetBranch)) {
return new HybridRagDecision(false, baseBranch, targetBranch, false, "target_is_base");
}

// Check if delta is enabled and available
if (!isDeltaIndexEnabled(project)) {
return new HybridRagDecision(false, baseBranch, targetBranch, false, "delta_disabled");
}

boolean deltaReady = isDeltaIndexReady(project, targetBranch);
if (deltaReady) {
return new HybridRagDecision(true, baseBranch, targetBranch, true, "delta_available");
} else {
return new HybridRagDecision(false, baseBranch, targetBranch, false, "delta_not_ready");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid possible NPE in base/target comparison.
Line 174 assumes baseBranch is non-null; a misconfigured project could still return null and throw. Consider a null-safe comparison.

🛠️ Suggested fix
-import java.util.function.Consumer;
-import java.util.Map;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Consumer;
@@
-        if (baseBranch.equals(targetBranch)) {
+        if (Objects.equals(baseBranch, targetBranch)) {
             return new HybridRagDecision(false, baseBranch, targetBranch, false, "target_is_base");
         }
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java`
around lines 144 - 188, The comparison baseBranch.equals(targetBranch) in
shouldUseHybridRag can NPE if getBaseBranch() returns null; change it to a
null-safe comparison (e.g., use Objects.equals(baseBranch, targetBranch) or
explicitly check baseBranch == null before calling equals) so the logic safely
handles a null baseBranch and proceeds to the delta checks; keep the rest of the
method and HybridRagDecision construction the same so null baseBranch is
preserved in the decision when appropriate.

Comment on lines 1 to 3
org/rostilos/codecrow/analysisapi/rag/RagOperationsService.class
module-info.class
org/rostilos/codecrow/analysisapi/rag/RagOperationsService$HybridRagDecision.class
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid committing build artifacts from target/.
These generated files should be removed from the PR and covered by .gitignore to keep VCS clean.

🧹 Suggested removal
-org/rostilos/codecrow/analysisapi/rag/RagOperationsService.class
-module-info.class
-org/rostilos/codecrow/analysisapi/rag/RagOperationsService$HybridRagDecision.class
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
org/rostilos/codecrow/analysisapi/rag/RagOperationsService.class
module-info.class
org/rostilos/codecrow/analysisapi/rag/RagOperationsService$HybridRagDecision.class
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst`
around lines 1 - 3, Remove the generated build artifacts (e.g., createdFiles.lst
entries and the class files RagOperationsService.class,
RagOperationsService$HybridRagDecision.class, module-info.class) from the PR and
ensure they are not tracked: delete them from the index (git rm --cached or
remove and recommit) and add a rule to ignore build outputs (e.g., add target/
to .gitignore). Re-run the build to confirm these files are untracked, then
commit the updated .gitignore and the removal changes.

Comment on lines 1 to 2
/var/www/html/codecrow/codecrow-public/java-ecosystem/libs/analysis-api/src/main/java/module-info.java
/var/www/html/codecrow/codecrow-public/java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove generated build artifacts from version control.

This target/maven-status/.../inputFiles.lst is a build output and should not be committed. Please delete it and add a gitignore rule for target/ (or at least this path) to prevent future noise.

🧹 Suggested .gitignore entry
+**/target/
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst`
around lines 1 - 2, Delete the generated Maven build artifact inputFiles.lst
from the repo and remove any other files under the target/maven-status tree;
then add a .gitignore entry to exclude target/ (or at minimum
target/maven-status/** and inputFiles.lst) so future build outputs aren’t
committed. Commit the deletion and the updated .gitignore together with a
concise commit message like "chore: remove generated target artifacts and ignore
target/".

Comment on lines +35 to +37
public String getMainBranch() {
return mainBranch != null ? mainBranch : defaultBranch;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle blank mainBranch to preserve fallback behavior.
If a client sends an empty string, fallback won’t happen and downstream code may receive an invalid branch name. Consider treating blank as absent.

Proposed fix
 public String getMainBranch() {
-    return mainBranch != null ? mainBranch : defaultBranch;
+    return (mainBranch != null && !mainBranch.isBlank()) ? mainBranch : defaultBranch;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public String getMainBranch() {
return mainBranch != null ? mainBranch : defaultBranch;
}
public String getMainBranch() {
return (mainBranch != null && !mainBranch.isBlank()) ? mainBranch : defaultBranch;
}
🤖 Prompt for AI Agents
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/request/UpdateProjectRequest.java`
around lines 35 - 37, UpdateProjectRequest.getMainBranch currently only falls
back to defaultBranch when mainBranch is null but not when it's an empty or
whitespace string; change the logic to treat blank values as absent by returning
defaultBranch when mainBranch is null or blank (e.g., use String.isBlank() or
trim().isEmpty()), ensuring callers of getMainBranch() never receive
empty/whitespace branch names.

Comment on lines +281 to 291
// Update main branch in project configuration if provided
if (request.getMainBranch() != null) {
var cfg = project.getConfiguration();
boolean useLocal = cfg == null ? false : cfg.useLocalMcp();
var branchAnalysis = cfg != null ? cfg.branchAnalysis() : null;
var ragConfig = cfg != null ? cfg.ragConfig() : null;
Boolean prAnalysisEnabled = cfg != null ? cfg.prAnalysisEnabled() : true;
Boolean branchAnalysisEnabled = cfg != null ? cfg.branchAnalysisEnabled() : true;
var installationMethod = cfg != null ? cfg.installationMethod() : null;
var commentCommands = cfg != null ? cfg.commentCommands() : null;
project.setConfiguration(new ProjectConfig(useLocal, request.getDefaultBranch(), branchAnalysis, ragConfig,
prAnalysisEnabled, branchAnalysisEnabled, installationMethod, commentCommands));
if (cfg == null) {
cfg = new ProjectConfig(false, request.getMainBranch());
} else {
cfg.setMainBranch(request.getMainBranch());
}
cfg.ensureMainBranchInPatterns();
project.setConfiguration(cfg);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against blank mainBranch updates.

createProject filters blanks, but updateProject accepts them; that can store an empty main branch or inject invalid patterns.

🛠️ Proposed fix
-        if (request.getMainBranch() != null) {
+        if (request.getMainBranch() != null && !request.getMainBranch().isBlank()) {
             var cfg = project.getConfiguration();
             if (cfg == null) {
                 cfg = new ProjectConfig(false, request.getMainBranch());
             } else {
                 cfg.setMainBranch(request.getMainBranch());
             }
             cfg.ensureMainBranchInPatterns();
             project.setConfiguration(cfg);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Update main branch in project configuration if provided
if (request.getMainBranch() != null) {
var cfg = project.getConfiguration();
boolean useLocal = cfg == null ? false : cfg.useLocalMcp();
var branchAnalysis = cfg != null ? cfg.branchAnalysis() : null;
var ragConfig = cfg != null ? cfg.ragConfig() : null;
Boolean prAnalysisEnabled = cfg != null ? cfg.prAnalysisEnabled() : true;
Boolean branchAnalysisEnabled = cfg != null ? cfg.branchAnalysisEnabled() : true;
var installationMethod = cfg != null ? cfg.installationMethod() : null;
var commentCommands = cfg != null ? cfg.commentCommands() : null;
project.setConfiguration(new ProjectConfig(useLocal, request.getDefaultBranch(), branchAnalysis, ragConfig,
prAnalysisEnabled, branchAnalysisEnabled, installationMethod, commentCommands));
if (cfg == null) {
cfg = new ProjectConfig(false, request.getMainBranch());
} else {
cfg.setMainBranch(request.getMainBranch());
}
cfg.ensureMainBranchInPatterns();
project.setConfiguration(cfg);
}
// Update main branch in project configuration if provided
if (request.getMainBranch() != null && !request.getMainBranch().isBlank()) {
var cfg = project.getConfiguration();
if (cfg == null) {
cfg = new ProjectConfig(false, request.getMainBranch());
} else {
cfg.setMainBranch(request.getMainBranch());
}
cfg.ensureMainBranchInPatterns();
project.setConfiguration(cfg);
}
🤖 Prompt for AI Agents
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/service/ProjectService.java`
around lines 281 - 291, The update path currently accepts
request.getMainBranch() even when empty; change the guard in ProjectService (the
updateProject flow around request.getMainBranch()) to only proceed when
mainBranch is non-null and not blank (e.g., trim-empty), and otherwise skip
updating the ProjectConfig (or reject the request); update the same block that
calls ProjectConfig.ensureMainBranchInPatterns() and
project.setConfiguration(cfg) so you never store an empty main branch or inject
invalid patterns.

logger = logging.getLogger(__name__)

app = FastAPI(title="CodeCrow RAG API", version="1.0.0")
app = FastAPI(title="CodeCrow RAG API", version="1.1.0")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Align root version with app version.

The app is set to 1.1.0 (Line 16), but the root endpoint still reports 1.0.0. This can confuse clients.

🛠️ Proposed fix
 def root():
-    return {"message": "CodeCrow RAG Pipeline API", "version": "1.0.0"}
+    return {"message": "CodeCrow RAG Pipeline API", "version": "1.1.0"}

Also applies to: 127-129

🤖 Prompt for AI Agents
In `@python-ecosystem/rag-pipeline/src/rag_pipeline/api/api.py` at line 16, The
root endpoint returns a hardcoded "1.0.0" while the FastAPI app is created with
version "1.1.0"; update the root endpoint implementation (the function handling
"/") to return the app.version value or to match "1.1.0" so they stay consistent
(refer to the FastAPI instance declared as app = FastAPI(...)); prefer using
app.version in the root response to prevent future drift.

Comment on lines 340 to 360
# Convert to Path objects and filter existing files
file_paths = []
for f in changed_files:
full_path = repo_path_obj / f
if full_path.exists() and full_path.is_file():
file_paths.append(full_path)
else:
logger.debug(f"Skipping non-existent file: {f}")

# Process in batches
for i in range(0, len(file_paths), DOCUMENT_BATCH_SIZE):
batch = file_paths[i:i + DOCUMENT_BATCH_SIZE]

documents = self.loader.load_specific_files(
file_paths=batch,
repo_base=repo_path_obj,
workspace=workspace,
project=project,
branch=delta_branch,
commit=delta_commit or "unknown"
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Delta loader receives absolute paths (breaks metadata + deletion).
DocumentLoader.load_specific_files(...) expects repo‑relative paths; passing full paths makes metadata["path"] absolute, which then won’t match the relative changed_files used in deletions and hybrid dedupe.

🛠️ Suggested fix
-            file_paths = []
-            for f in changed_files:
-                full_path = repo_path_obj / f
-                if full_path.exists() and full_path.is_file():
-                    file_paths.append(full_path)
+            file_paths = []
+            for f in changed_files:
+                relative_path = Path(f)
+                full_path = repo_path_obj / relative_path
+                if full_path.exists() and full_path.is_file():
+                    file_paths.append(relative_path)
                 else:
                     logger.debug(f"Skipping non-existent file: {f}")
@@
-            file_paths = []
-            for f in changed_files:
-                full_path = repo_path_obj / f
-                if full_path.exists() and full_path.is_file():
-                    file_paths.append(full_path)
+            file_paths = []
+            for f in changed_files:
+                relative_path = Path(f)
+                full_path = repo_path_obj / relative_path
+                if full_path.exists() and full_path.is_file():
+                    file_paths.append(relative_path)

Also applies to: 503-549

🤖 Prompt for AI Agents
In `@python-ecosystem/rag-pipeline/src/rag_pipeline/core/delta_index_manager.py`
around lines 340 - 360, The loader is being given absolute filesystem paths so
metadata["path"] becomes absolute and later deletions/dedupe (which use
repo-relative changed_files) fail; instead, keep the existence check using
full_path (repo_path_obj / f) but pass repo-relative Path/str to
DocumentLoader.load_specific_files. Concretely, in the block that builds
file_paths and calls load_specific_files, change file_paths to contain the
original repo-relative entries (e.g., f or Path(f)) after verifying
full_path.exists() and full_path.is_file(), and update the other similar section
(the repeated block around the 503-549 area) the same way so load_specific_files
receives repo-relative paths while existence checks use full_path.

Comment on lines 297 to 452
def get_hybrid_context_for_pr(
self,
workspace: str,
project: str,
base_branch: str,
target_branch: str,
changed_files: List[str],
diff_snippets: Optional[List[str]] = None,
pr_title: Optional[str] = None,
pr_description: Optional[str] = None,
top_k: int = 15,
enable_priority_reranking: bool = True,
min_relevance_score: float = 0.7,
delta_boost: float = DEFAULT_DELTA_BOOST
) -> HybridQueryResult:
"""
Get PR context using hybrid retrieval from base + delta indexes.
Args:
workspace: Workspace identifier
project: Project identifier
base_branch: The base RAG index branch (e.g., "master")
target_branch: The PR target branch (e.g., "release/1.0")
changed_files: Files changed in the PR
diff_snippets: Code snippets from the diff
pr_title: PR title for semantic understanding
pr_description: PR description
top_k: Number of results to retrieve
enable_priority_reranking: Apply priority-based boosting
min_relevance_score: Minimum score threshold
delta_boost: Score multiplier for delta results
Returns:
HybridQueryResult with combined context
"""
diff_snippets = diff_snippets or []

logger.info(
f"Hybrid RAG query: base={base_branch}, target={target_branch}, "
f"files={len(changed_files)}"
)

# Decompose queries
queries = self._decompose_queries(
pr_title=pr_title,
pr_description=pr_description,
diff_snippets=diff_snippets,
changed_files=changed_files
)

# Collection names
base_collection = self._get_base_collection_name(workspace, project, base_branch)
delta_collection = self._get_delta_collection_name(workspace, project, target_branch)

# Check what's available
base_exists = self._collection_exists(base_collection)
delta_exists = self._collection_exists(delta_collection)

logger.info(f"Collections: base={base_exists}, delta={delta_exists}")

all_base_results = []
all_delta_results = []

# Execute queries against both indexes
for q_text, q_weight, q_top_k, q_instruction_type in queries:
if not q_text.strip():
continue

# Query base index
if base_exists:
base_results = self._query_collection(
base_collection, q_text, q_top_k, q_instruction_type
)
for r in base_results:
r["_query_weight"] = q_weight
all_base_results.extend(base_results)

# Query delta index (with reduced top_k since it's supplementary)
if delta_exists:
delta_results = self._query_collection(
delta_collection, q_text, max(3, q_top_k // 2), q_instruction_type
)
for r in delta_results:
r["_query_weight"] = q_weight
all_delta_results.extend(delta_results)

# Merge results
merged_results = self._merge_hybrid_results(
base_results=all_base_results,
delta_results=all_delta_results,
delta_boost=delta_boost
)

# Apply priority reranking
if enable_priority_reranking:
final_results = self._apply_priority_reranking(
merged_results,
min_score_threshold=min_relevance_score
)
else:
final_results = [r for r in merged_results if r['score'] >= min_relevance_score]
final_results.sort(key=lambda x: x['score'], reverse=True)

# Fallback if too strict
if not final_results and merged_results:
logger.info("Hybrid RAG: threshold too strict, using top raw results")
seen = set()
unique_fallback = []
for r in sorted(merged_results, key=lambda x: x['score'], reverse=True):
content_hash = f"{r['metadata'].get('path', '')}:{r['text'][:100]}"
if content_hash not in seen:
seen.add(content_hash)
unique_fallback.append(r)
final_results = unique_fallback[:5]

# Collect related files
related_files = set()
for result in final_results:
if "path" in result["metadata"]:
related_files.add(result["metadata"]["path"])

# Build response
relevant_code = []
for result in final_results:
relevant_code.append({
"text": result["text"],
"score": result["score"],
"metadata": result["metadata"],
"_source": result.get("_source", "unknown")
})

# Count sources
base_count = sum(1 for r in relevant_code if r.get("_source") == "base")
delta_count = sum(1 for r in relevant_code if r.get("_source") == "delta")

logger.info(
f"Hybrid RAG: {len(relevant_code)} results "
f"({base_count} base, {delta_count} delta) from {len(related_files)} files"
)

return HybridQueryResult(
relevant_code=relevant_code,
related_files=list(related_files),
changed_files=changed_files,
hybrid_metadata={
"base_branch": base_branch,
"target_branch": target_branch,
"base_collection_used": base_exists,
"delta_collection_used": delta_exists,
"base_results_count": len(all_base_results),
"delta_results_count": len(all_delta_results),
"merged_results_count": len(merged_results),
"final_results_count": len(final_results),
"delta_boost_applied": delta_boost
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

top_k argument is currently ignored.
Line 307 exposes top_k, but it never affects retrieval sizes, so callers can’t tune result breadth.

🛠️ Suggested fix
-            if base_exists:
+            if base_exists:
+                base_top_k = min(q_top_k, top_k)
                 base_results = self._query_collection(
-                    base_collection, q_text, q_top_k, q_instruction_type
+                    base_collection, q_text, base_top_k, q_instruction_type
                 )
@@
-            if delta_exists:
+            if delta_exists:
+                delta_top_k = min(max(3, q_top_k // 2), max(3, top_k // 2))
                 delta_results = self._query_collection(
-                    delta_collection, q_text, max(3, q_top_k // 2), q_instruction_type
+                    delta_collection, q_text, delta_top_k, q_instruction_type
                 )
🧰 Tools
🪛 Ruff (0.14.13)

307-307: Unused method argument: top_k

(ARG002)

🤖 Prompt for AI Agents
In
`@python-ecosystem/rag-pipeline/src/rag_pipeline/services/hybrid_query_service.py`
around lines 297 - 452, The top_k parameter of get_hybrid_context_for_pr is
never used, so callers can’t control result breadth; update the retrieval calls
so per-query q_top_k is bounded by the function-level top_k: when iterating
queries (the queries variable produced by _decompose_queries), replace uses of
q_top_k passed into _query_collection with min(q_top_k, top_k) for base queries
and for delta queries use min(max(3, q_top_k // 2), top_k) (or otherwise ensure
delta never requests more than top_k), so _query_collection respects the top_k
argument; you can alternatively pass top_k into _decompose_queries if that
function should produce sized queries, but ensure get_hybrid_context_for_pr
enforces the top_k cap when calling _query_collection.

rostilos and others added 9 commits January 20, 2026 12:10
…pport; refactor async processing configurations
Unit test suite. 80% coverage for a java-ecosystem libs packages
…d-search

Bugfix/branch list pagination and search
…uplication

- Removed the HybridQueryService and integrated its functionality into RAGQueryService.
- Enhanced RAGQueryService to support multi-branch queries, allowing simultaneous searches across target and base branches.
- Implemented deduplication logic to prioritize results from the target branch while preserving cross-file relationships.
- Updated utility functions to create project-level namespaces for collections.
- Improved error handling and logging for collection existence checks and fallback branch logic.
- Adjusted semantic search methods to accommodate branch filtering and excluded paths.
- Enhanced documentation and comments for clarity on new functionalities and changes.
feat: Enhance OpenRouterEmbedding with debug logging and batch size h…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants