Skip to content

(retriever) Only overwrite LanceDB table once if specified#1402

Open
charlesbluca wants to merge 1 commit intoNVIDIA:retrieverfrom
charlesbluca:ret-fix-inprocess-overwrite
Open

(retriever) Only overwrite LanceDB table once if specified#1402
charlesbluca wants to merge 1 commit intoNVIDIA:retrieverfrom
charlesbluca:ret-fix-inprocess-overwrite

Conversation

@charlesbluca
Copy link
Collaborator

Description

Problem: With overwrite=True, the inprocess pipeline ran vdb_upload once per document. Each call replaced the LanceDB table, so only the last PDF’s embeddings remained and recall dropped.

Fix:

  • Split tasks into per-doc (extract → embed) and post (vdb_upload, save_to_disk).
  • Run post tasks once on the concatenated results of all documents.
  • With overwrite=True, the table is now replaced a single time with the full corpus.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@charlesbluca charlesbluca requested a review from a team as a code owner February 13, 2026 20:49
@charlesbluca charlesbluca requested review from jdye64 and removed request for a team February 13, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant