fix: batch embedding calls to prevent provider limit errors by haosenwang1018 · Pull Request #73 · zilliztech/memsearch

haosenwang1018 · 2026-02-20T14:44:22Z

Problem

memsearch index crashes with ClientError: 400 INVALID_ARGUMENT when a markdown file produces more than 100 chunks while using the Google embedding provider. This is because _embed_and_store() sends all chunks to embed() in a single call with no batching.

Fixes #72

Fix

Add batching in _embed_and_store() with a default batch_size of 96, which stays safely under the lowest known provider limit (Google: 100).

Changes

src/memsearch/core.py: Batch embed() calls in _embed_and_store()

The change is minimal and backward-compatible — behavior is identical for files that produce fewer than 96 chunks.

When a markdown file produces more than 100 chunks, the single embed() call exceeds Google's BatchEmbedContentsRequest limit of 100 items, causing a 400 INVALID_ARGUMENT error. Add batching in _embed_and_store() with a default batch_size of 96, which stays safely under the lowest known provider limit (Google: 100). This also future-proofs against similar limits in other providers (Voyage: 128, OpenAI: 2048). Fixes zilliztech#72

haosenwang1018 mentioned this pull request Feb 20, 2026

fix: batch Google embedding calls to respect 100-item API limit #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: batch embedding calls to prevent provider limit errors#73

fix: batch embedding calls to prevent provider limit errors#73
haosenwang1018 wants to merge 1 commit intozilliztech:mainfrom
haosenwang1018:fix/embed-batch-size-limit

haosenwang1018 commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

haosenwang1018 commented Feb 20, 2026

Problem

Fix

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant