-
Notifications
You must be signed in to change notification settings - Fork 0
1.2.0 rc #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Implemented GitLabBranchWebhookHandler to handle push events for branch analysis. - Created GitLabMergeRequestWebhookHandler to manage merge request events and trigger analysis. - Added GitLabMrMergeWebhookHandler for handling merge events and updating issue status. - Introduced GitLabWebhookParser to parse various GitLab webhook payloads. - Updated ProjectController to return CommentCommandsConfig directly. - Refactored ProjectService to utilize new configuration models for branch analysis and comment commands.
Delta indexes
|
Important Review skippedToo many files! 109 files out of 259 files are above the max files limit of 150. You can disable this status message by setting the 📝 WalkthroughWalkthroughThis pull request introduces a hierarchical Retrieval-Augmented Generation (RAG) system with delta indexing, event-driven architecture, and modular configuration. It adds new analysis-api and events modules, refactors project configuration types, implements delta index persistence and management, restructures pipeline-agent package organization, and extends VCS clients with branch listing and diff retrieval capabilities. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/ReviewCommandProcessor.java (2)
111-136: Error messages may be posted as reviews.When
buildReviewRequestreturnsnullor exceptions occur, this method returns user-facing error strings (e.g.,"⚠️ **Review Failed**..."). These pass thereview.isBlank()check inprocess()and will be posted as if they were successful reviews.Consider returning
nullor throwing an exception for errors, lettingprocess()handle all error responses uniformly.🐛 Suggested fix
private String generateReview(Project project, WebhookPayload payload) { try { ReviewRequest request = buildReviewRequest(project, payload); if (request == null) { log.warn("Failed to build review request - missing AI or VCS configuration"); - return "⚠️ **Review Failed**\n\nCould not generate review - missing AI or VCS configuration."; + return null; } log.info("Calling AI service to generate code review..."); ReviewResult result = aiCommandClient.review(request, event -> { log.debug("AI review event: {}", event); }); log.info("Code review generated successfully"); return result.review(); } catch (IOException e) { log.error("Failed to generate review via AI: {}", e.getMessage(), e); - return "⚠️ **Review Failed**\n\nError generating code review: " + e.getMessage(); + return null; } catch (Exception e) { log.error("Unexpected error generating review: {}", e.getMessage(), e); - return "⚠️ **Review Failed**\n\nUnexpected error: " + e.getMessage(); + return null; } }
163-165: PotentialNumberFormatExceptionwhen parsingpullRequestId.If
payload.pullRequestId()is non-null but contains a non-numeric string,Long.parseLong()will throwNumberFormatException. This exception propagates togenerateReview()where it's caught by the genericExceptionhandler, but adding explicit handling would provide a clearer error message.🐛 Suggested fix
- Long prId = payload.pullRequestId() != null - ? Long.parseLong(payload.pullRequestId()) - : null; + Long prId = null; + if (payload.pullRequestId() != null) { + try { + prId = Long.parseLong(payload.pullRequestId()); + } catch (NumberFormatException e) { + log.warn("Invalid PR ID format: {}", payload.pullRequestId()); + return null; + } + }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java (1)
76-78: Add null check before callingpayload.rawPayload().has("action").Line 76-78 calls
payload.rawPayload()without a null check. Evidence fromProviderWebhookController(line 311) shows thatpayload.rawPayload()can return null, yet this code immediately invokes.has("action")on the result, causing a potentialNullPointerException. Add a null check or use safe navigation:String action = payload.rawPayload() != null && payload.rawPayload().has("action") ? payload.rawPayload().get("action").asText() : null;java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/service/BitbucketReportingService.java (1)
203-205: Log the full exception, not just the message.Catching
Exceptionbroadly and logging onlye.getMessage()loses the stack trace, making debugging harder if unexpected errors occur.🔧 Suggested fix
} catch (Exception e) { - log.warn("Failed to post detailed issues as reply, will be included in annotations: {}", e.getMessage()); + log.warn("Failed to post detailed issues as reply, will be included in annotations", e); }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/webhookhandler/BitbucketCloudWebhookParser.java (1)
51-54: Potential issue:asInt()returns 0 for missing/invalid nodes.If the
pullrequestobject exists but theidfield is missing or invalid,asInt()returns 0, resulting inpullRequestId = "0". This could cause issues if "0" is treated as a valid PR ID downstream.🔧 Suggested defensive check
JsonNode pullRequest = payload.path("pullrequest"); if (!pullRequest.isMissingNode()) { - pullRequestId = String.valueOf(pullRequest.path("id").asInt()); + JsonNode prIdNode = pullRequest.path("id"); + if (!prIdNode.isMissingNode() && prIdNode.isNumber()) { + pullRequestId = String.valueOf(prIdNode.asInt()); + }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/controller/ProviderWebhookController.java (1)
336-345: Generic job is not persisted to the database.The job is instantiated with
new Job()but never saved throughjobServiceorjobRepository. When passed towebhookAsyncProcessor, the unsaved job will lack an ID and external ID, causing job tracking to fail.The suggested fix requires creating a
createGenericJobmethod inJobService(which doesn't currently exist). Consider adding one following the pattern of other creation methods:🐛 Suggested approach
Add a method to
JobService:`@Transactional` public Job createGenericJob( Project project, String title, JobTriggerSource triggerSource ) { Job job = new Job(); job.setProject(project); job.setJobType(JobType.MANUAL_ANALYSIS); job.setTriggerSource(triggerSource); job.setTitle(title); job.setStatus(JobStatus.PENDING); job = jobRepository.save(job); addLog(job, JobLogLevel.INFO, "init", "Generic job created"); return job; }Then in
ProviderWebhookControllerline 336-345:- Job job = new Job(); - job.setProject(project); - job.setJobType(JobType.MANUAL_ANALYSIS); - job.setTriggerSource(JobTriggerSource.WEBHOOK); - job.setTitle("Webhook: " + payload.eventType()); - job.setStatus(JobStatus.PENDING); - // Save through service would be better, but for now: - return job; + return jobService.createGenericJob( + project, + "Webhook: " + payload.eventType(), + JobTriggerSource.WEBHOOK + );java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java (1)
196-211: Missing completion event for non-IOException exceptions.The
AnalysisCompletedEventis only published forIOException. If other exceptions occur (e.g.,GeneralSecurityExceptionpropagating from VCS services, orRuntimeExceptionfromcreateOrUpdatePullRequest), no completion event is published.Consider wrapping the try-catch more broadly or adding a generic exception handler:
🔧 Suggested approach
} catch (IOException e) { log.error("IOException during PR analysis: {}", e.getMessage(), e); // ... existing handling ... return Map.of("status", "error", "message", e.getMessage()); + } catch (Exception e) { + log.error("Unexpected error during PR analysis: {}", e.getMessage(), e); + publishAnalysisCompletedEvent(project, request, correlationId, startTime, + AnalysisCompletedEvent.CompletionStatus.FAILED, 0, 0, e.getMessage()); + throw e; } finally { analysisLockService.releaseLock(lockKey.get()); }
🤖 Fix all issues with AI agents
In `@java-ecosystem/libs/analysis-api/.gitignore`:
- Around line 38-40: Remove the stray "index.ts" entry from the analysis-api
module's .gitignore because this is a Java/Maven module and that TypeScript
entry is likely copy-pasted; keep the `.env` and `server.log` entries intact and
only delete the "index.ts" line in the .gitignore for the analysis-api module.
In
`@java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java`:
- Around line 122-135: The default method createOrUpdateDeltaIndex currently
calls eventConsumer.accept(...) unguarded which can NPE if callers pass null;
update the method to guard eventConsumer (e.g., if (eventConsumer != null)
eventConsumer.accept(...), or use Optional.ofNullable(eventConsumer).ifPresent(c
-> c.accept(...))) so the default implementation remains a safe no-op when
eventConsumer is null; reference the eventConsumer parameter in the
createOrUpdateDeltaIndex method and apply the null-check before invoking accept.
- Around line 144-188: The comparison baseBranch.equals(targetBranch) in
shouldUseHybridRag can NPE if getBaseBranch() returns null; change it to a
null-safe comparison (e.g., use Objects.equals(baseBranch, targetBranch) or
explicitly check baseBranch == null before calling equals) so the logic safely
handles a null baseBranch and proceeds to the delta checks; keep the rest of the
method and HybridRagDecision construction the same so null baseBranch is
preserved in the decision when appropriate.
In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst`:
- Around line 1-3: Remove the generated build artifacts (e.g., createdFiles.lst
entries and the class files RagOperationsService.class,
RagOperationsService$HybridRagDecision.class, module-info.class) from the PR and
ensure they are not tracked: delete them from the index (git rm --cached or
remove and recommit) and add a rule to ignore build outputs (e.g., add target/
to .gitignore). Re-run the build to confirm these files are untracked, then
commit the updated .gitignore and the removal changes.
In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst`:
- Around line 1-2: Delete the generated Maven build artifact inputFiles.lst from
the repo and remove any other files under the target/maven-status tree; then add
a .gitignore entry to exclude target/ (or at minimum target/maven-status/** and
inputFiles.lst) so future build outputs aren’t committed. Commit the deletion
and the updated .gitignore together with a concise commit message like "chore:
remove generated target artifacts and ignore target/".
In `@java-ecosystem/libs/events/.gitignore`:
- Around line 38-40: Remove the stray index.ts entry from the .gitignore in the
events Java module: open the .gitignore file in java-ecosystem/libs/events,
delete the line containing "index.ts" (which is likely a copy-paste artifact),
and commit the change so the ignore file only contains Java-relevant entries
like .env and server.log.
In
`@java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/client/RagPipelineClient.java`:
- Around line 252-269: The URL path segments are not encoded; update
deleteDeltaIndex (and similarly deltaIndexExists, shouldUseHybrid, deleteIndex)
to properly escape workspace, project, and branch/deltaBranch before building
the request—either by using OkHttp's HttpUrl.Builder.addPathSegment(...) for
each segment (recommended) or by applying URLEncoder.encode(...,
StandardCharsets.UTF_8) to each segment; ensure the final Request.Builder uses
the encoded/constructed HttpUrl so special characters (like slashes) in branch
names do not break the path.
In
`@java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/RagOperationsServiceImpl.java`:
- Around line 589-594: The log construction in RagOperationsServiceImpl (inside
the code that calls eventConsumer.accept with Map.of and formats "Updating RAG
index from %s to %s") uses indexedCommit.substring(0, 7) and
currentCommit.substring(0, 7) which can throw StringIndexOutOfBoundsException
for short commit hashes; update the formatting to safely take the first up to 7
characters (e.g., compute a safe prefix using Math.min(commit.length(), 7) or a
helper like safePrefix(commit, 7)) for both indexedCommit and currentCommit
before calling eventConsumer.accept so the message never attempts to substring
past the string length.
In
`@java-ecosystem/libs/vcs-client/src/main/java/org/rostilos/codecrow/vcsclient/github/GitHubClient.java`:
- Around line 487-507: The code prepends workspaceId to repoIdOrSlug even when
repoIdOrSlug is already a full "owner/repo", causing malformed paths; update the
branch-resolution logic in listBranches (and mirror the same change in
getBranchDiff) so that after the numeric-ID check you: 1) detect if repoIdOrSlug
already contains a '/' (or starts with '/') and use it as repoFullName (trim any
leading slash), 2) otherwise build repoFullName as workspaceId + "/" +
repoIdOrSlug; keep the numeric-ID branch that resolves full_name unchanged but
ensure trimming/validation so repoFullName is always a proper owner/repo string
before calling the GitHub branches API.
In
`@java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/CommentCommandRateLimitService.java`:
- Around line 45-47: The code assumes config.getCommentCommandsConfig() is
non-null in CommentCommandRateLimitService (used to read rateLimit and
windowMinutes and again later); guard these accesses by checking if
config.getCommentCommandsConfig() == null and handle gracefully—either
return/disable rate limiting early or apply safe defaults (e.g., rateLimit = 0
or configured global defaults) before using
getEffectiveRateLimit()/getEffectiveRateLimitWindowMinutes(); update the places
that currently call getCommentCommandsConfig() (the initialization where
rateLimit/windowMinutes are read and the later usage around the second block) to
fetch the config into a local variable, null-check it, and proceed only when
non-null or use the chosen defaults so command handling is not blocked by a
missing CommentCommandsConfig.
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/controller/ProjectController.java`:
- Around line 538-542: ProjectController currently parses
request.installationMethod() using InstallationMethod.valueOf which is
case-sensitive; normalize the incoming string (e.g., call
toUpperCase(Locale.ROOT) on request.installationMethod()) before passing to
InstallationMethod.valueOf to accept lowercase inputs and avoid
IllegalArgumentException due to case mismatches, still keeping the existing
try/catch around InstallationMethod.valueOf and assigning the result to the
installationMethod variable.
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/request/UpdateProjectRequest.java`:
- Around line 35-37: UpdateProjectRequest.getMainBranch currently only falls
back to defaultBranch when mainBranch is null but not when it's an empty or
whitespace string; change the logic to treat blank values as absent by returning
defaultBranch when mainBranch is null or blank (e.g., use String.isBlank() or
trim().isEmpty()), ensuring callers of getMainBranch() never receive
empty/whitespace branch names.
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/service/ProjectService.java`:
- Around line 281-291: The update path currently accepts request.getMainBranch()
even when empty; change the guard in ProjectService (the updateProject flow
around request.getMainBranch()) to only proceed when mainBranch is non-null and
not blank (e.g., trim-empty), and otherwise skip updating the ProjectConfig (or
reject the request); update the same block that calls
ProjectConfig.ensureMainBranchInPatterns() and project.setConfiguration(cfg) so
you never store an empty main branch or inject invalid patterns.
In `@python-ecosystem/rag-pipeline/src/rag_pipeline/api/api.py`:
- Line 16: The root endpoint returns a hardcoded "1.0.0" while the FastAPI app
is created with version "1.1.0"; update the root endpoint implementation (the
function handling "/") to return the app.version value or to match "1.1.0" so
they stay consistent (refer to the FastAPI instance declared as app =
FastAPI(...)); prefer using app.version in the root response to prevent future
drift.
In `@python-ecosystem/rag-pipeline/src/rag_pipeline/core/delta_index_manager.py`:
- Around line 340-360: The loader is being given absolute filesystem paths so
metadata["path"] becomes absolute and later deletions/dedupe (which use
repo-relative changed_files) fail; instead, keep the existence check using
full_path (repo_path_obj / f) but pass repo-relative Path/str to
DocumentLoader.load_specific_files. Concretely, in the block that builds
file_paths and calls load_specific_files, change file_paths to contain the
original repo-relative entries (e.g., f or Path(f)) after verifying
full_path.exists() and full_path.is_file(), and update the other similar section
(the repeated block around the 503-549 area) the same way so load_specific_files
receives repo-relative paths while existence checks use full_path.
In
`@python-ecosystem/rag-pipeline/src/rag_pipeline/services/hybrid_query_service.py`:
- Around line 297-452: The top_k parameter of get_hybrid_context_for_pr is never
used, so callers can’t control result breadth; update the retrieval calls so
per-query q_top_k is bounded by the function-level top_k: when iterating queries
(the queries variable produced by _decompose_queries), replace uses of q_top_k
passed into _query_collection with min(q_top_k, top_k) for base queries and for
delta queries use min(max(3, q_top_k // 2), top_k) (or otherwise ensure delta
never requests more than top_k), so _query_collection respects the top_k
argument; you can alternatively pass top_k into _decompose_queries if that
function should produce sized queries, but ensure get_hybrid_context_for_pr
enforces the top_k cap when calling _query_collection.
🧹 Nitpick comments (32)
java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/integration/dto/request/RepoOnboardRequest.java (1)
109-134: Keep legacy setter in sync with mainBranch to avoid divergent state.
If callers still usesetDefaultBranch,mainBranchstays null. Consider syncing whenmainBranchhasn’t been explicitly set.♻️ Proposed refactor
`@Deprecated` public void setDefaultBranch(String defaultBranch) { this.defaultBranch = defaultBranch; + if (this.mainBranch == null) { + this.mainBranch = defaultBranch; + } }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/processor/command/ReviewCommandProcessor.java (3)
39-40: Consider platform-specific comment length limits.The
MAX_REVIEW_LENGTHof 65,000 may exceed comment limits on some VCS platforms (e.g., Bitbucket's limit is approximately 32,768 characters). Consider making this configurable per VCS provider or using the lowest common denominator.
46-53: Consider injectingVcsConnectionCredentialsExtractorfor better testability.Instantiating
VcsConnectionCredentialsExtractorinternally couples this class to its implementation. For easier unit testing, consider injecting it as a dependency or providing a package-private constructor for tests.♻️ Suggested refactor
public ReviewCommandProcessor( AiCommandClient aiCommandClient, - TokenEncryptionService tokenEncryptionService + TokenEncryptionService tokenEncryptionService, + VcsConnectionCredentialsExtractor credentialsExtractor ) { this.aiCommandClient = aiCommandClient; this.tokenEncryptionService = tokenEncryptionService; - this.credentialsExtractor = new VcsConnectionCredentialsExtractor(tokenEncryptionService); + this.credentialsExtractor = credentialsExtractor; }
102-105: Consider sanitizing exception messages in error responses.Directly including
e.getMessage()in the error response could potentially expose internal details (e.g., connection strings, file paths, stack traces from nested causes) to end users. Consider using a generic error message while logging the full details.♻️ Suggested approach
} catch (Exception e) { log.error("Error generating review: {}", e.getMessage(), e); - return WebhookResult.error("Failed to generate code review: " + e.getMessage()); + return WebhookResult.error("Failed to generate code review. Please check the logs for details."); }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/service/GitHubAiClientService.java (1)
87-92: Consider explicit case handling for safety.The default case silently treats any non-
BRANCH_ANALYSIStype as a PR analysis. If newAnalysisTypevalues are introduced, this could lead to unexpectedClassCastExceptionat runtime.💡 Suggested defensive approach
switch (request.getAnalysisType()) { case BRANCH_ANALYSIS: return buildBranchAnalysisRequest(project, (BranchProcessRequest) request, previousAnalysis); + case PR_ANALYSIS: + case INCREMENTAL_PR_ANALYSIS: return buildPrAnalysisRequest(project, (PrProcessRequest) request, previousAnalysis); + default: + throw new IllegalArgumentException("Unsupported analysis type: " + request.getAnalysisType()); }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/service/GitHubReportingService.java (2)
179-184: Consider defensive parsing for comment IDs.
Long.parseLong(placeholderCommentId)will throwNumberFormatExceptionif the input is malformed. While callers should provide valid IDs, consider wrapping in a try-catch or using a utility method for consistency across lines 182, 322, and 348.♻️ Suggested approach
private long parseCommentId(String commentId) { try { return Long.parseLong(commentId); } catch (NumberFormatException e) { throw new IllegalArgumentException("Invalid comment ID: " + commentId, e); } }Then use
parseCommentId(placeholderCommentId)instead ofLong.parseLong(...).
353-359: Acknowledged: Mermaid diagrams disabled pending validation.The TODO explains the rationale well. When you're ready to re-enable Mermaid support, consider adding a validation step using a Mermaid parser/linter to catch syntax errors before posting, or implement a fallback to ASCII when validation fails.
Would you like me to open an issue to track implementing Mermaid syntax validation for GitHub comments?
java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/github/webhookhandler/GitHubPullRequestWebhookHandler.java (1)
121-131: Consider parsingpullRequestIdonce.
Long.parseLong(payload.pullRequestId())is called three times in this method (lines 121, 126, 162). Extract to a local variable at the method entry to avoid redundant parsing.♻️ Suggested refactor
private WebhookResult handlePullRequestEvent( WebhookPayload payload, Project project, Consumer<Map<String, Object>> eventConsumer ) { String placeholderCommentId = null; + Long pullRequestId = Long.parseLong(payload.pullRequestId()); try { // Post placeholder comment immediately to show analysis has started - placeholderCommentId = postPlaceholderComment(project, Long.parseLong(payload.pullRequestId())); + placeholderCommentId = postPlaceholderComment(project, pullRequestId); // Convert WebhookPayload to PrProcessRequest PrProcessRequest request = new PrProcessRequest(); request.projectId = project.getId(); - request.pullRequestId = Long.parseLong(payload.pullRequestId()); + request.pullRequestId = pullRequestId;And at line 162:
- updatePlaceholderWithError(project, Long.parseLong(payload.pullRequestId()), placeholderCommentId, e.getMessage()); + updatePlaceholderWithError(project, pullRequestId, placeholderCommentId, e.getMessage());java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/service/GitLabAiClientService.java (1)
87-92: Consider handling unknown analysis types explicitly.The
defaultcase assumes any non-BRANCH_ANALYSIStype is a PR/MR request. If a new analysis type is added to the enum in the future, this would result in aClassCastExceptionat runtime.♻️ Suggested improvement
switch (request.getAnalysisType()) { case BRANCH_ANALYSIS: return buildBranchAnalysisRequest(project, (BranchProcessRequest) request, previousAnalysis); - default: + case MERGE_REQUEST_ANALYSIS: + case PULL_REQUEST_ANALYSIS: return buildMrAnalysisRequest(project, (PrProcessRequest) request, previousAnalysis); + default: + throw new IllegalArgumentException("Unsupported analysis type: " + request.getAnalysisType()); }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/dto/webhook/WebhookPayload.java (1)
172-181: Consider caching the parsed command to avoid redundant parsing.
hasCodecrowCommand()andgetCodecrowCommand()both callparseCommand(), which involves string operations. If both methods are called in sequence (common pattern: check then get), the command is parsed twice.♻️ Suggested optimization
Since records are immutable, you could either:
- Accept the minor overhead (parsing is lightweight)
- Have callers use
getCodecrowCommand()directly and null-check:CodecrowCommand cmd = payload.getCodecrowCommand(); if (cmd != null) { // use cmd }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/bitbucket/webhookhandler/BitbucketCloudWebhookParser.java (1)
120-122: Same issue withasLong()for comment IDs.Similar to the PR ID issue,
asLong()returns 0 for missing nodes. Consider adding a defensive check here as well, though this is lower risk since comment IDs are used for threading rather than entity lookup.java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/analysis/controller/PullRequestController.java (1)
82-94: Grouped map may not preserve order within each group.The comment claims "maintaining order within each group", but
Collectors.groupingBywith default downstream collectors uses aHashMapwhich doesn't guarantee iteration order. The input is ordered byprNumberDesc, but the resultingListwithin each group depends on encounter order, which should be preserved byCollectors.toList().However, the
groupedvariable is typed asMap<String, List<PullRequestDTO>>, and the actual map implementation fromgroupingByisHashMap. If you need guaranteed order of groups (by branch name) as well, consider usingLinkedHashMap:♻️ Optional: Use LinkedHashMap to preserve group encounter order
Map<String, List<PullRequestDTO>> grouped = pullRequestList.stream() .collect(Collectors.groupingBy( pr -> pr.getTargetBranchName() == null ? "unknown" : pr.getTargetBranchName(), + LinkedHashMap::new, Collectors.mapping(pr -> { CodeAnalysis analysis = codeAnalysisRepository .findByProjectIdAndPrNumberWithMaxPrVersion(project.getId(), pr.getPrNumber()) .orElse(null); return PullRequestDTO.fromPullRequestWithAnalysis(pr, analysis); }, Collectors.toList()) ));java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/service/PipelineJobService.java (2)
324-335: Redundant pattern in failure detection.The check for
"failed to"already covers the case of"failed to parse", making the second condition redundant.♻️ Remove redundant check
private boolean isFailureComment(String comment) { if (comment == null) return false; String lower = comment.toLowerCase(); return lower.contains("failed to") || - lower.contains("failed to parse") || lower.contains("unable to") || lower.contains("cannot perform") || lower.contains("error:") || lower.contains("response parsing failed") || lower.contains("agent returned intermediate tool results") || lower.contains("agent reached its step limit"); }
346-348: Redundant null check in log statement.At line 347,
job.getExternalId()is called in the log statement even thoughjobwas already confirmed to be null by the check at line 342. However, looking closer, this code path is only reached whenjob != null(early return at 342), so the ternary is unnecessary.♻️ Simplify the log statement
public void failJob(Job job, String errorMessage) { if (job == null) return; try { jobService.failJob(job, errorMessage); } catch (Exception e) { - log.error("Error failing job {}", job != null ? job.getExternalId() : "null", e); + log.error("Error failing job {}", job.getExternalId(), e); } }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/controller/ProviderPipelineActionController.java (1)
176-186: Use a dedicated executor for CompletableFuture.supplyAsync() to avoid blocking the common ForkJoinPool.
CompletableFuture.supplyAsync()without an explicit executor uses the common ForkJoinPool, which is not designed for blocking I/O operations like webhook processing. This can lead to thread starvation if multiple requests queue up. Create and inject a dedicatedThreadPoolTaskExecutorbean configured for webhook processing tasks, and pass it tosupplyAsync(supplier, executor). Consider reusing this pattern in other async operations like RagIndexingController (line 74).java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/webhookhandler/WebhookHandler.java (1)
69-71: Consider importingHashMapinstead of using fully qualified name.Line 70 uses
java.util.HashMapinline. For consistency with the import style in this file, consider adding an import statement.♻️ Suggested fix
Add to imports:
import java.util.HashMap;Then update line 70:
- Map<String, Object> body = new java.util.HashMap<>(data); + Map<String, Object> body = new HashMap<>(data);java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/webhookhandler/WebhookHandlerFactory.java (1)
82-88: Consider importingArrayListinstead of using fully qualified name.Line 83 uses
java.util.ArrayListinline, similar to the pattern inWebhookHandler.java.♻️ Suggested fix
Add to imports:
+import java.util.ArrayList;Then update line 83:
- List<WebhookHandler> result = new java.util.ArrayList<>( + List<WebhookHandler> result = new ArrayList<>(java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/gitlab/webhookhandler/GitLabMergeRequestWebhookHandler.java (1)
123-128: Consider handlingNumberFormatExceptionexplicitly forpullRequestIdparsing.
Long.parseLong(payload.pullRequestId())is called multiple times (lines 123, 128, 163). IfpullRequestIdis null or non-numeric, this throwsNumberFormatExceptionwhich gets caught by the generic exception handler, potentially masking the root cause.Consider validating the ID early or providing a clearer error message.
♻️ Suggested improvement
private WebhookResult handleMergeRequestEvent( WebhookPayload payload, Project project, Consumer<Map<String, Object>> eventConsumer ) { String placeholderCommentId = null; + Long mrId; + try { + mrId = Long.parseLong(payload.pullRequestId()); + } catch (NumberFormatException e) { + log.error("Invalid MR ID format: {}", payload.pullRequestId()); + return WebhookResult.error("Invalid merge request ID: " + payload.pullRequestId()); + } try { // Post placeholder comment immediately to show analysis has started - placeholderCommentId = postPlaceholderComment(project, Long.parseLong(payload.pullRequestId())); + placeholderCommentId = postPlaceholderComment(project, mrId); // Convert WebhookPayload to PrProcessRequest PrProcessRequest request = new PrProcessRequest(); request.projectId = project.getId(); - request.pullRequestId = Long.parseLong(payload.pullRequestId()); + request.pullRequestId = mrId;java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/project/ProjectConfigChangedEvent.java (1)
26-36: Add correlationId constructor for event tracing parity.Several event types in this module support correlation IDs; consider matching that pattern here for consistent tracing across event flows.
♻️ Proposed constructor overload
- public ProjectConfigChangedEvent(Object source, Long projectId, String projectName, - ChangeType changeType, String changedField, - Object oldValue, Object newValue) { - super(source); - this.projectId = projectId; - this.projectName = projectName; - this.changeType = changeType; - this.changedField = changedField; - this.oldValue = oldValue; - this.newValue = newValue; - } + public ProjectConfigChangedEvent(Object source, Long projectId, String projectName, + ChangeType changeType, String changedField, + Object oldValue, Object newValue) { + this(source, null, projectId, projectName, changeType, changedField, oldValue, newValue); + } + + public ProjectConfigChangedEvent(Object source, String correlationId, Long projectId, String projectName, + ChangeType changeType, String changedField, + Object oldValue, Object newValue) { + super(source, correlationId); + this.projectId = projectId; + this.projectName = projectName; + this.changeType = changeType; + this.changedField = changedField; + this.oldValue = oldValue; + this.newValue = newValue; + }java-ecosystem/libs/events/src/main/java/org/rostilos/codecrow/events/analysis/AnalysisCompletedEvent.java (1)
27-42: Consider defensive copy for themetricsmap to preserve immutability.The
metricsmap is stored directly, allowing the caller to mutate event data after construction. For a proper immutable event, wrap it in an unmodifiable view.♻️ Suggested fix
+import java.util.Collections; + public AnalysisCompletedEvent(Object source, String correlationId, Long projectId, Long jobId, CompletionStatus status, Duration duration, int issuesFound, int filesAnalyzed, String errorMessage, Map<String, Object> metrics) { super(source, correlationId); this.projectId = projectId; this.jobId = jobId; this.status = status; this.duration = duration; this.issuesFound = issuesFound; this.filesAnalyzed = filesAnalyzed; this.errorMessage = errorMessage; - this.metrics = metrics; + this.metrics = metrics != null ? Collections.unmodifiableMap(metrics) : null; }java-ecosystem/services/pipeline-agent/src/main/java/org/rostilos/codecrow/pipelineagent/generic/controller/ProviderWebhookController.java (2)
186-188: Security: Signature validation TODO should be addressed before release.The webhook endpoint without token accepts all webhooks for configured repos without cryptographic validation. This is a security gap that allows spoofed webhooks to trigger analysis jobs.
Would you like me to help implement signature validation for GitHub (
X-Hub-Signature-256) and GitLab (X-Gitlab-Token) webhooks? This should be prioritized before the 1.2.0 release.
309-312: Consider extracting PR merge detection into a helper method for readability.The compound boolean expression spans multiple conditions and providers. Extracting this to a named method would improve clarity.
♻️ Suggested refactor
private boolean isPrMergeEvent(WebhookPayload payload) { String eventType = payload.eventType(); if ("pullrequest:fulfilled".equals(eventType)) { return true; // Bitbucket } if ("pull_request.closed".equals(eventType) && payload.rawPayload() != null) { return payload.rawPayload().path("pull_request").path("merged").asBoolean(false); // GitHub } return false; }java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/persistence/repository/rag/RagDeltaIndexRepository.java (2)
27-28: Hardcoded status strings in JPQL queries are fragile.Lines 27-28, 33-34, 38-40, 43-45, 55-56, and 58-60 use hardcoded
'READY','STALE','ARCHIVED'strings. If theDeltaIndexStatusenum values change, these queries will silently break. Consider using parameterized queries with the enum, similar tofindByProjectIdAndStatus.♻️ Example refactor for findReadyByProjectId
-@Query("SELECT d FROM RagDeltaIndex d WHERE d.project.id = :projectId AND d.status = 'READY'") -List<RagDeltaIndex> findReadyByProjectId(`@Param`("projectId") Long projectId); +@Query("SELECT d FROM RagDeltaIndex d WHERE d.project.id = :projectId AND d.status = :status") +List<RagDeltaIndex> findReadyByProjectId(`@Param`("projectId") Long projectId, `@Param`("status") DeltaIndexStatus status);Or use the derived query method pattern:
List<RagDeltaIndex> findByProjectIdAndStatus(Long projectId, DeltaIndexStatus status);
42-51: Consider adding@Transactionalto@Modifyingmethods.While Spring Data JPA often handles transactions automatically, explicitly annotating
@Modifyingmethods with@Transactionalensures consistent behavior across different calling contexts.`@Modifying` `@Transactional` `@Query`("UPDATE RagDeltaIndex d SET d.status = 'STALE', d.updatedAt = CURRENT_TIMESTAMP " + "WHERE d.project.id = :projectId AND d.baseBranch = :baseBranch AND d.status = 'READY'") int markDeltasAsStale(`@Param`("projectId") Long projectId, `@Param`("baseBranch") String baseBranch);java-ecosystem/libs/core/src/main/resources/db/migration/1.2.0/V1.2.0__add_rag_delta_index_table.sql (1)
24-28: Index onbranch_namealone may have limited utility.The index
idx_rag_delta_branchonbranch_name(line 27) withoutproject_idwill likely have low selectivity since branch names likemain,master,developare common across projects. Queries typically filter byproject_idfirst.Consider whether this index is needed, or replace with a more useful composite index.
python-ecosystem/rag-pipeline/src/rag_pipeline/api/api.py (2)
5-5: Avoid mutable default list fordiff_snippets.Use a default factory to avoid shared defaults across requests.
♻️ Proposed refactor
-from pydantic import BaseModel +from pydantic import BaseModel, Field @@ - diff_snippets: Optional[List[str]] = [] + diff_snippets: Optional[List[str]] = Field(default_factory=list)Also applies to: 72-80
371-442: Use exception chaining andlogger.exceptionin new endpoints.Catching broad
Exceptionwithout chaining drops stack context, and returning raw error text can leak internals. Also, Line 548 uses an f-string without placeholders.Apply the same pattern to the other new delta/hybrid endpoints for consistency.🛠️ Suggested pattern
- except Exception as e: - logger.error(f"Error creating delta index: {e}") - raise HTTPException(status_code=500, detail=str(e)) + except Exception as e: + logger.exception("Error creating delta index") + raise HTTPException(status_code=500, detail="Internal server error") from e @@ - logger.warning(f"No base index available for hybrid query") + logger.warning("No base index available for hybrid query")Also applies to: 526-587
java-ecosystem/libs/events/src/main/java/module-info.java (1)
1-9: Module naming inconsistency.The module name
codecrow.eventsdeviates from the established naming pattern used by other modules in this codebase (e.g.,org.rostilos.codecrow.core,org.rostilos.codecrow.ragengine,org.rostilos.codecrow.analysisapi). Consider renaming toorg.rostilos.codecrow.eventsfor consistency.♻️ Suggested fix
-module codecrow.events { +module org.rostilos.codecrow.events { requires spring.context; requires org.rostilos.codecrow.core; exports org.rostilos.codecrow.events; exports org.rostilos.codecrow.events.analysis; exports org.rostilos.codecrow.events.rag; exports org.rostilos.codecrow.events.project; }Note: If you rename the module, update the
requires codecrow.events;statements in dependent modules (e.g.,rag-engine/module-info.javaat line 14) torequires org.rostilos.codecrow.events;.java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/request/CreateProjectRequest.java (1)
29-36: Consider adding deprecation metadata for consistency.The
@Deprecatedannotations lack thesinceandforRemovalattributes. Following the pattern used elsewhere in the codebase (e.g.,ProjectVcsConnectionBindinguses@Deprecated(since = "2.0", forRemoval = true)), consider enriching the annotations:♻️ Suggested improvement
// optional main branch - the primary branch used as baseline for RAG indexing and analysis private String mainBranch; /** * `@deprecated` Use mainBranch instead */ - `@Deprecated` + `@Deprecated`(since = "1.2.0", forRemoval = true) private String defaultBranch;Apply the same pattern to the getter at line 88.
java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/project/config/RagConfig.java (1)
28-42: Consider consolidating constructor overloads using a builder or defaults.The four constructors create maintenance overhead. As the record grows, consider using a builder pattern or reducing to the canonical constructor with null defaults.
java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/DeltaIndexService.java (1)
66-77: Consider using a more specific exception type instead of RuntimeException.Wrapping
IOExceptionin a genericRuntimeExceptionloses exception type information. A customRagPipelineExceptionor usingUncheckedIOExceptionwould preserve more context for callers.♻️ Optional improvement
- } catch (IOException e) { - throw new RuntimeException("Failed to create delta index: " + e.getMessage(), e); + } catch (IOException e) { + throw new java.io.UncheckedIOException("Failed to create delta index: " + e.getMessage(), e); }java-ecosystem/libs/rag-engine/src/main/java/org/rostilos/codecrow/ragengine/service/RagOperationsServiceImpl.java (1)
358-383: Consider handling the case where delta index entity might not exist in error handler.In the catch block,
ragDeltaIndexRepository.findByProjectIdAndBranchNamecould return empty if the entity was never persisted (e.g., if an exception occurred before line 312). TheifPresenthandles this, but the error path would silently skip updating the status.This is minor since the job failure is still recorded, but logging when the entity isn't found would help debugging:
♻️ Optional improvement
ragDeltaIndexRepository.findByProjectIdAndBranchName(project.getId(), deltaBranch) - .ifPresent(di -> { + .ifPresentOrElse(di -> { di.setStatus(DeltaIndexStatus.FAILED); di.setErrorMessage(e.getMessage()); di.setUpdatedAt(OffsetDateTime.now()); ragDeltaIndexRepository.save(di); - }); + }, () -> log.debug("Delta index entity not found for failure update - may not have been persisted yet"));
| index.ts | ||
| .env | ||
| server.log No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index.ts seems out of place in a Java module.
This is a Java Maven module (analysis-api), but index.ts is a TypeScript entry point file. This appears to be copy-pasted from a Node.js project template. Consider removing it if not applicable to this module.
.env and server.log are reasonable entries for any project.
🤖 Prompt for AI Agents
In `@java-ecosystem/libs/analysis-api/.gitignore` around lines 38 - 40, Remove the
stray "index.ts" entry from the analysis-api module's .gitignore because this is
a Java/Maven module and that TypeScript entry is likely copy-pasted; keep the
`.env` and `server.log` entries intact and only delete the "index.ts" line in
the .gitignore for the analysis-api module.
| default void createOrUpdateDeltaIndex( | ||
| Project project, | ||
| String deltaBranch, | ||
| String baseBranch, | ||
| String deltaCommit, | ||
| String rawDiff, | ||
| Consumer<Map<String, Object>> eventConsumer | ||
| ) { | ||
| // Default implementation does nothing - override in actual implementation | ||
| eventConsumer.accept(Map.of( | ||
| "type", "warning", | ||
| "message", "Delta index operations not implemented" | ||
| )); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard eventConsumer to avoid default-method NPEs.
Line 131 calls eventConsumer.accept(...) unconditionally. If callers pass null, this will throw even though the default impl is supposed to be a safe no-op.
🛠️ Suggested fix
- eventConsumer.accept(Map.of(
- "type", "warning",
- "message", "Delta index operations not implemented"
- ));
+ if (eventConsumer != null) {
+ eventConsumer.accept(Map.of(
+ "type", "warning",
+ "message", "Delta index operations not implemented"
+ ));
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| default void createOrUpdateDeltaIndex( | |
| Project project, | |
| String deltaBranch, | |
| String baseBranch, | |
| String deltaCommit, | |
| String rawDiff, | |
| Consumer<Map<String, Object>> eventConsumer | |
| ) { | |
| // Default implementation does nothing - override in actual implementation | |
| eventConsumer.accept(Map.of( | |
| "type", "warning", | |
| "message", "Delta index operations not implemented" | |
| )); | |
| } | |
| default void createOrUpdateDeltaIndex( | |
| Project project, | |
| String deltaBranch, | |
| String baseBranch, | |
| String deltaCommit, | |
| String rawDiff, | |
| Consumer<Map<String, Object>> eventConsumer | |
| ) { | |
| // Default implementation does nothing - override in actual implementation | |
| if (eventConsumer != null) { | |
| eventConsumer.accept(Map.of( | |
| "type", "warning", | |
| "message", "Delta index operations not implemented" | |
| )); | |
| } | |
| } |
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java`
around lines 122 - 135, The default method createOrUpdateDeltaIndex currently
calls eventConsumer.accept(...) unguarded which can NPE if callers pass null;
update the method to guard eventConsumer (e.g., if (eventConsumer != null)
eventConsumer.accept(...), or use Optional.ofNullable(eventConsumer).ifPresent(c
-> c.accept(...))) so the default implementation remains a safe no-op when
eventConsumer is null; reference the eventConsumer parameter in the
createOrUpdateDeltaIndex method and apply the null-check before invoking accept.
| default boolean isDeltaIndexReady(Project project, String branchName) { | ||
| return false; | ||
| } | ||
|
|
||
| /** | ||
| * Decision record for hybrid RAG usage. | ||
| */ | ||
| record HybridRagDecision( | ||
| boolean useHybrid, | ||
| String baseBranch, | ||
| String targetBranch, | ||
| boolean deltaAvailable, | ||
| String reason | ||
| ) {} | ||
|
|
||
| /** | ||
| * Determine if hybrid RAG should be used for a PR. | ||
| * | ||
| * @param project The project | ||
| * @param targetBranch The PR target branch | ||
| * @return Decision about whether to use hybrid RAG | ||
| */ | ||
| default HybridRagDecision shouldUseHybridRag(Project project, String targetBranch) { | ||
| if (!isRagEnabled(project)) { | ||
| return new HybridRagDecision(false, null, targetBranch, false, "rag_disabled"); | ||
| } | ||
|
|
||
| String baseBranch = getBaseBranch(project); | ||
|
|
||
| // If target is the base branch, no need for hybrid | ||
| if (baseBranch.equals(targetBranch)) { | ||
| return new HybridRagDecision(false, baseBranch, targetBranch, false, "target_is_base"); | ||
| } | ||
|
|
||
| // Check if delta is enabled and available | ||
| if (!isDeltaIndexEnabled(project)) { | ||
| return new HybridRagDecision(false, baseBranch, targetBranch, false, "delta_disabled"); | ||
| } | ||
|
|
||
| boolean deltaReady = isDeltaIndexReady(project, targetBranch); | ||
| if (deltaReady) { | ||
| return new HybridRagDecision(true, baseBranch, targetBranch, true, "delta_available"); | ||
| } else { | ||
| return new HybridRagDecision(false, baseBranch, targetBranch, false, "delta_not_ready"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid possible NPE in base/target comparison.
Line 174 assumes baseBranch is non-null; a misconfigured project could still return null and throw. Consider a null-safe comparison.
🛠️ Suggested fix
-import java.util.function.Consumer;
-import java.util.Map;
+import java.util.Map;
+import java.util.Objects;
+import java.util.function.Consumer;
@@
- if (baseBranch.equals(targetBranch)) {
+ if (Objects.equals(baseBranch, targetBranch)) {
return new HybridRagDecision(false, baseBranch, targetBranch, false, "target_is_base");
}🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java`
around lines 144 - 188, The comparison baseBranch.equals(targetBranch) in
shouldUseHybridRag can NPE if getBaseBranch() returns null; change it to a
null-safe comparison (e.g., use Objects.equals(baseBranch, targetBranch) or
explicitly check baseBranch == null before calling equals) so the logic safely
handles a null baseBranch and proceeds to the delta checks; keep the rest of the
method and HybridRagDecision construction the same so null baseBranch is
preserved in the decision when appropriate.
| org/rostilos/codecrow/analysisapi/rag/RagOperationsService.class | ||
| module-info.class | ||
| org/rostilos/codecrow/analysisapi/rag/RagOperationsService$HybridRagDecision.class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid committing build artifacts from target/.
These generated files should be removed from the PR and covered by .gitignore to keep VCS clean.
🧹 Suggested removal
-org/rostilos/codecrow/analysisapi/rag/RagOperationsService.class
-module-info.class
-org/rostilos/codecrow/analysisapi/rag/RagOperationsService$HybridRagDecision.class📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| org/rostilos/codecrow/analysisapi/rag/RagOperationsService.class | |
| module-info.class | |
| org/rostilos/codecrow/analysisapi/rag/RagOperationsService$HybridRagDecision.class |
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/createdFiles.lst`
around lines 1 - 3, Remove the generated build artifacts (e.g., createdFiles.lst
entries and the class files RagOperationsService.class,
RagOperationsService$HybridRagDecision.class, module-info.class) from the PR and
ensure they are not tracked: delete them from the index (git rm --cached or
remove and recommit) and add a rule to ignore build outputs (e.g., add target/
to .gitignore). Re-run the build to confirm these files are untracked, then
commit the updated .gitignore and the removal changes.
| /var/www/html/codecrow/codecrow-public/java-ecosystem/libs/analysis-api/src/main/java/module-info.java | ||
| /var/www/html/codecrow/codecrow-public/java-ecosystem/libs/analysis-api/src/main/java/org/rostilos/codecrow/analysisapi/rag/RagOperationsService.java |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove generated build artifacts from version control.
This target/maven-status/.../inputFiles.lst is a build output and should not be committed. Please delete it and add a gitignore rule for target/ (or at least this path) to prevent future noise.
🧹 Suggested .gitignore entry
+**/target/🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-api/target/maven-status/maven-compiler-plugin/compile/default-compile/inputFiles.lst`
around lines 1 - 2, Delete the generated Maven build artifact inputFiles.lst
from the repo and remove any other files under the target/maven-status tree;
then add a .gitignore entry to exclude target/ (or at minimum
target/maven-status/** and inputFiles.lst) so future build outputs aren’t
committed. Commit the deletion and the updated .gitignore together with a
concise commit message like "chore: remove generated target artifacts and ignore
target/".
| public String getMainBranch() { | ||
| return mainBranch != null ? mainBranch : defaultBranch; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle blank mainBranch to preserve fallback behavior.
If a client sends an empty string, fallback won’t happen and downstream code may receive an invalid branch name. Consider treating blank as absent.
Proposed fix
public String getMainBranch() {
- return mainBranch != null ? mainBranch : defaultBranch;
+ return (mainBranch != null && !mainBranch.isBlank()) ? mainBranch : defaultBranch;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public String getMainBranch() { | |
| return mainBranch != null ? mainBranch : defaultBranch; | |
| } | |
| public String getMainBranch() { | |
| return (mainBranch != null && !mainBranch.isBlank()) ? mainBranch : defaultBranch; | |
| } |
🤖 Prompt for AI Agents
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/dto/request/UpdateProjectRequest.java`
around lines 35 - 37, UpdateProjectRequest.getMainBranch currently only falls
back to defaultBranch when mainBranch is null but not when it's an empty or
whitespace string; change the logic to treat blank values as absent by returning
defaultBranch when mainBranch is null or blank (e.g., use String.isBlank() or
trim().isEmpty()), ensuring callers of getMainBranch() never receive
empty/whitespace branch names.
| // Update main branch in project configuration if provided | ||
| if (request.getMainBranch() != null) { | ||
| var cfg = project.getConfiguration(); | ||
| boolean useLocal = cfg == null ? false : cfg.useLocalMcp(); | ||
| var branchAnalysis = cfg != null ? cfg.branchAnalysis() : null; | ||
| var ragConfig = cfg != null ? cfg.ragConfig() : null; | ||
| Boolean prAnalysisEnabled = cfg != null ? cfg.prAnalysisEnabled() : true; | ||
| Boolean branchAnalysisEnabled = cfg != null ? cfg.branchAnalysisEnabled() : true; | ||
| var installationMethod = cfg != null ? cfg.installationMethod() : null; | ||
| var commentCommands = cfg != null ? cfg.commentCommands() : null; | ||
| project.setConfiguration(new ProjectConfig(useLocal, request.getDefaultBranch(), branchAnalysis, ragConfig, | ||
| prAnalysisEnabled, branchAnalysisEnabled, installationMethod, commentCommands)); | ||
| if (cfg == null) { | ||
| cfg = new ProjectConfig(false, request.getMainBranch()); | ||
| } else { | ||
| cfg.setMainBranch(request.getMainBranch()); | ||
| } | ||
| cfg.ensureMainBranchInPatterns(); | ||
| project.setConfiguration(cfg); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against blank mainBranch updates.
createProject filters blanks, but updateProject accepts them; that can store an empty main branch or inject invalid patterns.
🛠️ Proposed fix
- if (request.getMainBranch() != null) {
+ if (request.getMainBranch() != null && !request.getMainBranch().isBlank()) {
var cfg = project.getConfiguration();
if (cfg == null) {
cfg = new ProjectConfig(false, request.getMainBranch());
} else {
cfg.setMainBranch(request.getMainBranch());
}
cfg.ensureMainBranchInPatterns();
project.setConfiguration(cfg);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Update main branch in project configuration if provided | |
| if (request.getMainBranch() != null) { | |
| var cfg = project.getConfiguration(); | |
| boolean useLocal = cfg == null ? false : cfg.useLocalMcp(); | |
| var branchAnalysis = cfg != null ? cfg.branchAnalysis() : null; | |
| var ragConfig = cfg != null ? cfg.ragConfig() : null; | |
| Boolean prAnalysisEnabled = cfg != null ? cfg.prAnalysisEnabled() : true; | |
| Boolean branchAnalysisEnabled = cfg != null ? cfg.branchAnalysisEnabled() : true; | |
| var installationMethod = cfg != null ? cfg.installationMethod() : null; | |
| var commentCommands = cfg != null ? cfg.commentCommands() : null; | |
| project.setConfiguration(new ProjectConfig(useLocal, request.getDefaultBranch(), branchAnalysis, ragConfig, | |
| prAnalysisEnabled, branchAnalysisEnabled, installationMethod, commentCommands)); | |
| if (cfg == null) { | |
| cfg = new ProjectConfig(false, request.getMainBranch()); | |
| } else { | |
| cfg.setMainBranch(request.getMainBranch()); | |
| } | |
| cfg.ensureMainBranchInPatterns(); | |
| project.setConfiguration(cfg); | |
| } | |
| // Update main branch in project configuration if provided | |
| if (request.getMainBranch() != null && !request.getMainBranch().isBlank()) { | |
| var cfg = project.getConfiguration(); | |
| if (cfg == null) { | |
| cfg = new ProjectConfig(false, request.getMainBranch()); | |
| } else { | |
| cfg.setMainBranch(request.getMainBranch()); | |
| } | |
| cfg.ensureMainBranchInPatterns(); | |
| project.setConfiguration(cfg); | |
| } |
🤖 Prompt for AI Agents
In
`@java-ecosystem/services/web-server/src/main/java/org/rostilos/codecrow/webserver/project/service/ProjectService.java`
around lines 281 - 291, The update path currently accepts
request.getMainBranch() even when empty; change the guard in ProjectService (the
updateProject flow around request.getMainBranch()) to only proceed when
mainBranch is non-null and not blank (e.g., trim-empty), and otherwise skip
updating the ProjectConfig (or reject the request); update the same block that
calls ProjectConfig.ensureMainBranchInPatterns() and
project.setConfiguration(cfg) so you never store an empty main branch or inject
invalid patterns.
| logger = logging.getLogger(__name__) | ||
|
|
||
| app = FastAPI(title="CodeCrow RAG API", version="1.0.0") | ||
| app = FastAPI(title="CodeCrow RAG API", version="1.1.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align root version with app version.
The app is set to 1.1.0 (Line 16), but the root endpoint still reports 1.0.0. This can confuse clients.
🛠️ Proposed fix
def root():
- return {"message": "CodeCrow RAG Pipeline API", "version": "1.0.0"}
+ return {"message": "CodeCrow RAG Pipeline API", "version": "1.1.0"}Also applies to: 127-129
🤖 Prompt for AI Agents
In `@python-ecosystem/rag-pipeline/src/rag_pipeline/api/api.py` at line 16, The
root endpoint returns a hardcoded "1.0.0" while the FastAPI app is created with
version "1.1.0"; update the root endpoint implementation (the function handling
"/") to return the app.version value or to match "1.1.0" so they stay consistent
(refer to the FastAPI instance declared as app = FastAPI(...)); prefer using
app.version in the root response to prevent future drift.
| # Convert to Path objects and filter existing files | ||
| file_paths = [] | ||
| for f in changed_files: | ||
| full_path = repo_path_obj / f | ||
| if full_path.exists() and full_path.is_file(): | ||
| file_paths.append(full_path) | ||
| else: | ||
| logger.debug(f"Skipping non-existent file: {f}") | ||
|
|
||
| # Process in batches | ||
| for i in range(0, len(file_paths), DOCUMENT_BATCH_SIZE): | ||
| batch = file_paths[i:i + DOCUMENT_BATCH_SIZE] | ||
|
|
||
| documents = self.loader.load_specific_files( | ||
| file_paths=batch, | ||
| repo_base=repo_path_obj, | ||
| workspace=workspace, | ||
| project=project, | ||
| branch=delta_branch, | ||
| commit=delta_commit or "unknown" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delta loader receives absolute paths (breaks metadata + deletion).
DocumentLoader.load_specific_files(...) expects repo‑relative paths; passing full paths makes metadata["path"] absolute, which then won’t match the relative changed_files used in deletions and hybrid dedupe.
🛠️ Suggested fix
- file_paths = []
- for f in changed_files:
- full_path = repo_path_obj / f
- if full_path.exists() and full_path.is_file():
- file_paths.append(full_path)
+ file_paths = []
+ for f in changed_files:
+ relative_path = Path(f)
+ full_path = repo_path_obj / relative_path
+ if full_path.exists() and full_path.is_file():
+ file_paths.append(relative_path)
else:
logger.debug(f"Skipping non-existent file: {f}")
@@
- file_paths = []
- for f in changed_files:
- full_path = repo_path_obj / f
- if full_path.exists() and full_path.is_file():
- file_paths.append(full_path)
+ file_paths = []
+ for f in changed_files:
+ relative_path = Path(f)
+ full_path = repo_path_obj / relative_path
+ if full_path.exists() and full_path.is_file():
+ file_paths.append(relative_path)Also applies to: 503-549
🤖 Prompt for AI Agents
In `@python-ecosystem/rag-pipeline/src/rag_pipeline/core/delta_index_manager.py`
around lines 340 - 360, The loader is being given absolute filesystem paths so
metadata["path"] becomes absolute and later deletions/dedupe (which use
repo-relative changed_files) fail; instead, keep the existence check using
full_path (repo_path_obj / f) but pass repo-relative Path/str to
DocumentLoader.load_specific_files. Concretely, in the block that builds
file_paths and calls load_specific_files, change file_paths to contain the
original repo-relative entries (e.g., f or Path(f)) after verifying
full_path.exists() and full_path.is_file(), and update the other similar section
(the repeated block around the 503-549 area) the same way so load_specific_files
receives repo-relative paths while existence checks use full_path.
| def get_hybrid_context_for_pr( | ||
| self, | ||
| workspace: str, | ||
| project: str, | ||
| base_branch: str, | ||
| target_branch: str, | ||
| changed_files: List[str], | ||
| diff_snippets: Optional[List[str]] = None, | ||
| pr_title: Optional[str] = None, | ||
| pr_description: Optional[str] = None, | ||
| top_k: int = 15, | ||
| enable_priority_reranking: bool = True, | ||
| min_relevance_score: float = 0.7, | ||
| delta_boost: float = DEFAULT_DELTA_BOOST | ||
| ) -> HybridQueryResult: | ||
| """ | ||
| Get PR context using hybrid retrieval from base + delta indexes. | ||
| Args: | ||
| workspace: Workspace identifier | ||
| project: Project identifier | ||
| base_branch: The base RAG index branch (e.g., "master") | ||
| target_branch: The PR target branch (e.g., "release/1.0") | ||
| changed_files: Files changed in the PR | ||
| diff_snippets: Code snippets from the diff | ||
| pr_title: PR title for semantic understanding | ||
| pr_description: PR description | ||
| top_k: Number of results to retrieve | ||
| enable_priority_reranking: Apply priority-based boosting | ||
| min_relevance_score: Minimum score threshold | ||
| delta_boost: Score multiplier for delta results | ||
| Returns: | ||
| HybridQueryResult with combined context | ||
| """ | ||
| diff_snippets = diff_snippets or [] | ||
|
|
||
| logger.info( | ||
| f"Hybrid RAG query: base={base_branch}, target={target_branch}, " | ||
| f"files={len(changed_files)}" | ||
| ) | ||
|
|
||
| # Decompose queries | ||
| queries = self._decompose_queries( | ||
| pr_title=pr_title, | ||
| pr_description=pr_description, | ||
| diff_snippets=diff_snippets, | ||
| changed_files=changed_files | ||
| ) | ||
|
|
||
| # Collection names | ||
| base_collection = self._get_base_collection_name(workspace, project, base_branch) | ||
| delta_collection = self._get_delta_collection_name(workspace, project, target_branch) | ||
|
|
||
| # Check what's available | ||
| base_exists = self._collection_exists(base_collection) | ||
| delta_exists = self._collection_exists(delta_collection) | ||
|
|
||
| logger.info(f"Collections: base={base_exists}, delta={delta_exists}") | ||
|
|
||
| all_base_results = [] | ||
| all_delta_results = [] | ||
|
|
||
| # Execute queries against both indexes | ||
| for q_text, q_weight, q_top_k, q_instruction_type in queries: | ||
| if not q_text.strip(): | ||
| continue | ||
|
|
||
| # Query base index | ||
| if base_exists: | ||
| base_results = self._query_collection( | ||
| base_collection, q_text, q_top_k, q_instruction_type | ||
| ) | ||
| for r in base_results: | ||
| r["_query_weight"] = q_weight | ||
| all_base_results.extend(base_results) | ||
|
|
||
| # Query delta index (with reduced top_k since it's supplementary) | ||
| if delta_exists: | ||
| delta_results = self._query_collection( | ||
| delta_collection, q_text, max(3, q_top_k // 2), q_instruction_type | ||
| ) | ||
| for r in delta_results: | ||
| r["_query_weight"] = q_weight | ||
| all_delta_results.extend(delta_results) | ||
|
|
||
| # Merge results | ||
| merged_results = self._merge_hybrid_results( | ||
| base_results=all_base_results, | ||
| delta_results=all_delta_results, | ||
| delta_boost=delta_boost | ||
| ) | ||
|
|
||
| # Apply priority reranking | ||
| if enable_priority_reranking: | ||
| final_results = self._apply_priority_reranking( | ||
| merged_results, | ||
| min_score_threshold=min_relevance_score | ||
| ) | ||
| else: | ||
| final_results = [r for r in merged_results if r['score'] >= min_relevance_score] | ||
| final_results.sort(key=lambda x: x['score'], reverse=True) | ||
|
|
||
| # Fallback if too strict | ||
| if not final_results and merged_results: | ||
| logger.info("Hybrid RAG: threshold too strict, using top raw results") | ||
| seen = set() | ||
| unique_fallback = [] | ||
| for r in sorted(merged_results, key=lambda x: x['score'], reverse=True): | ||
| content_hash = f"{r['metadata'].get('path', '')}:{r['text'][:100]}" | ||
| if content_hash not in seen: | ||
| seen.add(content_hash) | ||
| unique_fallback.append(r) | ||
| final_results = unique_fallback[:5] | ||
|
|
||
| # Collect related files | ||
| related_files = set() | ||
| for result in final_results: | ||
| if "path" in result["metadata"]: | ||
| related_files.add(result["metadata"]["path"]) | ||
|
|
||
| # Build response | ||
| relevant_code = [] | ||
| for result in final_results: | ||
| relevant_code.append({ | ||
| "text": result["text"], | ||
| "score": result["score"], | ||
| "metadata": result["metadata"], | ||
| "_source": result.get("_source", "unknown") | ||
| }) | ||
|
|
||
| # Count sources | ||
| base_count = sum(1 for r in relevant_code if r.get("_source") == "base") | ||
| delta_count = sum(1 for r in relevant_code if r.get("_source") == "delta") | ||
|
|
||
| logger.info( | ||
| f"Hybrid RAG: {len(relevant_code)} results " | ||
| f"({base_count} base, {delta_count} delta) from {len(related_files)} files" | ||
| ) | ||
|
|
||
| return HybridQueryResult( | ||
| relevant_code=relevant_code, | ||
| related_files=list(related_files), | ||
| changed_files=changed_files, | ||
| hybrid_metadata={ | ||
| "base_branch": base_branch, | ||
| "target_branch": target_branch, | ||
| "base_collection_used": base_exists, | ||
| "delta_collection_used": delta_exists, | ||
| "base_results_count": len(all_base_results), | ||
| "delta_results_count": len(all_delta_results), | ||
| "merged_results_count": len(merged_results), | ||
| "final_results_count": len(final_results), | ||
| "delta_boost_applied": delta_boost | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
top_k argument is currently ignored.
Line 307 exposes top_k, but it never affects retrieval sizes, so callers can’t tune result breadth.
🛠️ Suggested fix
- if base_exists:
+ if base_exists:
+ base_top_k = min(q_top_k, top_k)
base_results = self._query_collection(
- base_collection, q_text, q_top_k, q_instruction_type
+ base_collection, q_text, base_top_k, q_instruction_type
)
@@
- if delta_exists:
+ if delta_exists:
+ delta_top_k = min(max(3, q_top_k // 2), max(3, top_k // 2))
delta_results = self._query_collection(
- delta_collection, q_text, max(3, q_top_k // 2), q_instruction_type
+ delta_collection, q_text, delta_top_k, q_instruction_type
)🧰 Tools
🪛 Ruff (0.14.13)
307-307: Unused method argument: top_k
(ARG002)
🤖 Prompt for AI Agents
In
`@python-ecosystem/rag-pipeline/src/rag_pipeline/services/hybrid_query_service.py`
around lines 297 - 452, The top_k parameter of get_hybrid_context_for_pr is
never used, so callers can’t control result breadth; update the retrieval calls
so per-query q_top_k is bounded by the function-level top_k: when iterating
queries (the queries variable produced by _decompose_queries), replace uses of
q_top_k passed into _query_collection with min(q_top_k, top_k) for base queries
and for delta queries use min(max(3, q_top_k // 2), top_k) (or otherwise ensure
delta never requests more than top_k), so _query_collection respects the top_k
argument; you can alternatively pass top_k into _decompose_queries if that
function should produce sized queries, but ensure get_hybrid_context_for_pr
enforces the top_k cap when calling _query_collection.
…pport; refactor async processing configurations
Unit test suite. 80% coverage for a java-ecosystem libs packages
…d-search Bugfix/branch list pagination and search
…uplication - Removed the HybridQueryService and integrated its functionality into RAGQueryService. - Enhanced RAGQueryService to support multi-branch queries, allowing simultaneous searches across target and base branches. - Implemented deduplication logic to prioritize results from the target branch while preserving cross-file relationships. - Updated utility functions to create project-level namespaces for collections. - Improved error handling and logging for collection existence checks and fallback branch logic. - Adjusted semantic search methods to accommodate branch filtering and excluded paths. - Enhanced documentation and comments for clarity on new functionalities and changes.
Multi branch indexes
feat: Enhance OpenRouterEmbedding with debug logging and batch size h…
Summary by CodeRabbit
Release Notes
New Features
mainBranchconfiguration with backward compatibility fordefaultBranchChores
✏️ Tip: You can customize this high-level summary in your review settings.