Skip to content

Feature/hnsw vector search#187

Merged
DenisovAV merged 9 commits intomainfrom
feature/hnsw-vector-search
Mar 4, 2026
Merged

Feature/hnsw vector search#187
DenisovAV merged 9 commits intomainfrom
feature/hnsw-vector-search

Conversation

@DenisovAV
Copy link
Owner

No description provided.

Renamed SentencePiece's bundled protobuf namespace from google::protobuf
to google::protobuf_sp to avoid symbol conflicts with MediaPipe's protobuf.

Root cause: Both SentencePiece and MediaPipe exported google::protobuf::*
symbols. At link time, the linker arbitrarily chose one implementation,
causing memory corruption when mismatched vtables were used.

Changes:
- Renamed namespace in port_def.inc: google::protobuf -> google::protobuf_sp
- Updated all 75 protobuf-lite source files with new namespace
- Removed unused protobuf_namespace.h
- Cleaned up podspec preprocessor definitions

Fixes: #184
Implements Hierarchical Navigable Small World (HNSW) algorithm for
fast approximate nearest neighbor search in VectorStore.

Architecture:
- SQLite remains source of truth (persistence)
- HNSW serves as in-memory cache (fast search)
- Hybrid search: HNSW candidates -> exact similarity recalculation
- Threshold: HNSW used when document count >= 100

Changes:
- Added local_hnsw dependency (pure Dart, cross-platform)
- Created HnswVectorIndex wrapper with add/search/rebuild/clear
- Added Pigeon methods: getAllDocumentsWithEmbeddings, getDocumentsByIds
- Implemented native methods in Android (Kotlin), iOS (Swift), Web (JS)
- Updated MobileVectorStoreRepository and WebVectorStoreRepository
- Added 21 unit tests for HnswVectorIndex
- Added 5 HNSW vs brute-force parity tests

Performance:
- Search complexity: O(log n) vs O(n) brute-force
- Index rebuilt on initialize() from SQLite data
- Documents synced to both SQLite and HNSW on add
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory HNSW index on top of the existing SQLite-backed VectorStore to speed up similarity search, plus new cross-platform APIs to bulk-load embeddings (for rebuild) and fetch documents by ID (for candidate hydration).

Changes:

  • Introduce HnswVectorIndex (Dart) and integrate it into web + mobile repositories with rebuild-on-initialize and hybrid search flow.
  • Add new platform APIs: getAllDocumentsWithEmbeddings and getDocumentsByIds across Web (JS worker), Android (Kotlin), and iOS (Swift) + Pigeon plumbing.
  • Adjust iOS sentencepiece/protobuf-lite namespace to avoid symbol conflicts (protobuf → protobuf_sp), and add HNSW-focused tests.

Reviewed changes

Copilot reviewed 97 out of 102 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
web/sqlite_vector_store.js Web proxy: add worker fetch-via-Blob and expose new worker methods for HNSW rebuild/hydration.
web/rag/sqlite_vector_store_worker.js Worker: implement getAllDocumentsWithEmbeddings + getDocumentsByIds and wire message handler cases.
web/rag/sqlite_vector_store.js Web proxy (rag): expose getAllDocumentsWithEmbeddings + getDocumentsByIds.
test/vector_store_parity_test.dart Add parity tests comparing HNSW results vs brute-force cosine similarity.
test/hnsw_index_test.dart New unit tests for HnswVectorIndex behaviors (add/search/rebuild/remove/threshold).
pubspec.yaml Add local_hnsw dependency.
pubspec.lock Lockfile update for local_hnsw.
pigeon.dart Add Pigeon APIs + DocumentWithEmbedding model for HNSW rebuild.
lib/web/vector_store_web.dart Extend JS interop with new methods + Dart-friendly parsers.
lib/pigeon.g.dart Regenerated Pigeon Dart bindings for new APIs/types.
lib/core/infrastructure/web_vector_store_repository.dart Integrate HNSW cache layer and hybrid search flow on web.
lib/core/infrastructure/mobile_vector_store_repository.dart Integrate HNSW cache layer and hybrid search flow on mobile.
lib/core/infrastructure/hnsw_vector_index.dart New HNSW wrapper around local_hnsw with over-fetch + rerank.
ios/flutter_gemma.podspec Update protobuf conflict handling approach (namespace rename moved into sources).
ios/Classes/sentencepiece/third_party/protobuf-lite/zero_copy_stream_impl_lite.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/zero_copy_stream_impl.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/zero_copy_stream.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/wire_format_lite.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/time.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/strutil.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/structurally_valid.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/stringprintf.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/stringpiece.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/statusor.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/status.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/repeated_field.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/parse_context.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/message_lite.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/io_win32.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/int128.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/implicit_weak_message.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/wire_format_lite.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/unknown_field_set.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/time.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/strutil.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/stringprintf.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/stringpiece.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/stl_util.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/statusor.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/status.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/port.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/once.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/mutex.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/map_util.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/macros.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/logging.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/int128.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/hash.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/common.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/casts.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/callback.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/stubs/bytestream.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/repeated_field.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/port_def.inc Change PROTOBUF_NAMESPACE macros to google::protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/parse_context.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/metadata_lite.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/message_lite.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/map_type_handler.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/map_field_lite.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/map_entry_lite.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/map.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/io/zero_copy_stream_impl_lite.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/io/zero_copy_stream_impl.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/io/zero_copy_stream.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/io/io_win32.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/io/coded_stream.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/implicit_weak_message.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/has_bits.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/generated_message_util.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/generated_message_table_driven_lite.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/generated_message_table_driven.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/generated_enum_util.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/generated_enum_reflection.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/extension_set_inl.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/extension_set.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/descriptor.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/arenastring.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/arena_impl.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/arena.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/google/protobuf/any.h Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/generated_message_util.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/generated_message_table_driven_lite.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/generated_enum_util.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/extension_set.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/common.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/coded_stream.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/bytestream.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/arenastring.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/third_party/protobuf-lite/arena.cc Rename protobuf namespace to protobuf_sp.
ios/Classes/sentencepiece/src/init.cc Update shutdown call to google::protobuf_sp::ShutdownProtobufLibrary().
ios/Classes/sentencepiece/protobuf_namespace.h Remove old macro-based namespace renaming header.
ios/Classes/VectorStore.swift Add iOS implementations of new bulk embedding fetch + fetch-by-IDs APIs.
ios/Classes/PigeonInterface.g.swift Regenerated Pigeon Swift bindings for new APIs/types.
ios/Classes/FlutterGemmaPlugin.swift Implement new Pigeon calls for iOS platform service.
example/web/sqlite_vector_store.js Mirror web proxy updates for the example app.
example/pubspec.lock Example lockfile update (package version + local_hnsw transitive).
example/ios/Podfile.lock Example iOS lockfile update for plugin version.
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/VectorStore.kt Add Android implementations of new bulk embedding fetch + fetch-by-IDs APIs.
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/PigeonInterface.g.kt Regenerated Pigeon Kotlin bindings for new APIs/types.
android/src/main/kotlin/dev/flutterberlin/flutter_gemma/FlutterGemmaPlugin.kt Implement new Pigeon calls for Android platform service.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Namespace is already defined in build.gradle, so the package
attribute in AndroidManifest.xml is redundant and produces a
lint warning.

Fixes #190
…NPACK

Two root causes of incorrect iOS embeddings (diffSimilarity ~0.79):

1. UnigramTokenizer used Viterbi algorithm on BPE vocab, producing 30
   tokens instead of 15. Replaced with BPETokenizer implementing
   SentencePiece greedy pair-merge algorithm via linked list + priority
   queue — matches SentencePiece C++ output exactly.

2. XNNPACK was disabled in v0.11.16 (#155) to work around a crash, but
   the crash was caused by SentencePiece C++ protobuf symbol conflict
   with TFLite — not XNNPACK itself. Re-enabled now that C++ is removed.

Verified with Python TFLite: XNNPACK ON gives diffSimilarity=0.13,
XNNPACK OFF gives 0.75 — mixed-precision models require XNNPACK.

Also includes:
- iosPath parameter for tokenizer.json on iOS (avoids .model protobuf conflict)
- Remove ~150 SentencePiece C++ / protobuf-lite source files
- Simplify podspec to Swift-only sources
- Migrate integration tests to Patrol
- Add embedding stability integration test
- Remove GemmaEmbeddingWrapper: plugin calls EmbeddingModel directly
  (matches Android architecture, fixes double task prefix bug)
- Fix README/CLAUDE.md: iosUrl → iosPath to match actual API parameter
- Add iosToken parameter to tokenizerFromNetwork for separate iOS auth
- Fix _validateIosCompatibility: parse URI path instead of full URL
  (handles query params correctly)
- Remove hardcoded /Users/sashadenisov/ path from desktop test
…allationBuilder

The validation only applies to tokenizer files, not all network downloads.
Also improve doc comments for iosToken parameter.
…op JAR URL

- Restore UnigramTokenizer.swift (Trie + Viterbi) for Gecko models
- Auto-detect BPE vs Unigram from tokenizer.json model.type field
- Add dual-model integration test (EmbeddingGemma + Gecko)
- Update JAR_VERSION to 0.12.5 in macOS/Linux setup scripts (#189)
@DenisovAV DenisovAV merged commit 5857c00 into main Mar 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants