fix(retrieval): make proj: tag queries exact (BM25-only + mustContain) by 1034378361 · Pull Request #56 · win4r/memory-lancedb-pro

1034378361 · 2026-03-05T05:31:51Z

What

When a query contains tag-style tokens like proj:AIF, treat it as an exact filter instead of a semantic query:

Use BM25 (FTS-only) first
Hard filter results: mustContain all extracted proj: tokens
If no literal matches, fall back to existing hybrid retrieval (to avoid returning nothing)

Why

Short tokens are prone to semantic false positives in hybrid retrieval because vector search dominates the candidate pool. For project tags, users usually expect exact matching, not related-content recall.

Scope / Compatibility

Only affects queries containing proj: tokens
Normal natural-language queries keep the original hybrid path

Implementation

src/retriever.ts:

Added tag token extraction (proj:[A-Za-z0-9._-]+)
Added bm25OnlyRetrieval() helper with mustContain filter
Hooked in early in retrieve()

Related issue

Fixes: #55

rwmjhb

Thanks for the PR! The approach is sound — routing tag-style queries to BM25 + mustContain is a clean solution for the false-positive problem described in #55.

I verified locally that LanceDB FTS treats : as a tokenizer (not a Tantivy field separator), so the BM25 stage correctly returns candidates and mustContain filters them down. The core logic works.

A few suggestions before merging:

1. Missing test coverage

+81 lines of new logic with no tests. At minimum, a unit test for extractTagTokens() and an integration-level test verifying that a proj:AIF query returns only entries literally containing that tag would be helpful.

2. `as RetrievalResult` type safety

const mapped = literalFiltered.map(
  (result, index) =>
    ({
      ...result,
      sources: {
        bm25: { score: result.score, rank: index + 1 },
        fused: { score: result.score },
      },
    }) as RetrievalResult,
);

sources.vector is absent here. If any downstream code accesses sources.vector.score, it will throw. Worth checking that no consumer expects it, or add vector: undefined explicitly.

3. (Minor) Extensibility of tag patterns

Currently hardcoded to proj: only. Not a blocker, but consider making the tag prefix configurable (e.g., via retrieval config) or at least using an array of patterns, so adding env:, team:, etc. later doesn't require code changes.

Overall: approve once tests are added. Nice work! 👍

fix(retrieval): route proj: tag queries to BM25-only + mustContain

814e2ca

rwmjhb reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(retrieval): make proj: tag queries exact (BM25-only + mustContain)#56

fix(retrieval): make proj: tag queries exact (BM25-only + mustContain)#56
1034378361 wants to merge 1 commit intowin4r:mainfrom
1034378361:fix/tag-query-fts-only

1034378361 commented Mar 5, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

1034378361 commented Mar 5, 2026

What

Why

Scope / Compatibility

Implementation

Related issue

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

1. Missing test coverage

2. as RetrievalResult type safety

3. (Minor) Extensibility of tag patterns

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. `as RetrievalResult` type safety