-
Notifications
You must be signed in to change notification settings - Fork 26
#357: Document LSMVector index implementation #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly updates the vector embeddings documentation to align with the modern LSMVectorIndex implementation, which is built on the JVector library. The primary goal is to provide clear, comprehensive guidance for users, emphasizing SQL integration and robust transaction support, while also offering a migration path and best practices for optimal performance and scalability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🧪 CI InsightsHere's what we observed from your CI run for 2ddbd3f. 🟢 All jobs passed!But CI Insights is watching 👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a comprehensive rewrite of the vector embeddings documentation, shifting the focus to the modern LSMVectorIndex implementation. The new documentation is well-structured, with a SQL-first approach, detailed examples for both SQL and the Java API, and sections on best practices and migration. My review focuses on improving the clarity and completeness of the code examples to ensure they are easy for users to understand and adapt. I've suggested defining missing variables in several Java snippets and improving the consistency of SQL query examples.
Improve code example clarity and completeness: - Update Basic Similarity Search to use named parameters ($queryVector) instead of hardcoded vector arrays for consistency with best practices - Update "Combining with Other Filters" example to use named parameters - Add clarification note explaining the difference between using full index names (e.g., 'Document[embedding]') in SELECT clauses vs property names (e.g., 'embedding') in WHERE clauses - Define queryVector variable in "Querying Vectors from Java" example with type and comment about matching dimensions - Define embeddingVector variable in "Transaction Support" example with type and comment about embedding model generation - Complete legacy bulk loading pattern example with all necessary variable definitions, imports, and initialization code for clarity These changes make the documentation examples more complete, runnable, and easier for users to understand and adapt. 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Rewrite vector embeddings documentation to focus on the new LSMVectorIndex implementation built on the JVector library. This replaces the outdated HnswVectorIndexRAM-centric approach with comprehensive coverage of: - LSMVectorIndex features and architecture - SQL-first approach for index creation and querying - Java API programmatic alternatives - All three similarity functions (COSINE, DOT_PRODUCT, EUCLIDEAN) - Transaction support and automatic persistence - Performance tuning parameters - Real-world use cases (semantic search, image similarity, recommendations) - Migration guide from legacy HnswVectorIndexRAM The documentation now emphasizes LSMVectorIndex as the primary production approach while maintaining backwards compatibility information. 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Improve code example clarity and completeness: - Update Basic Similarity Search to use named parameters ($queryVector) instead of hardcoded vector arrays for consistency with best practices - Update "Combining with Other Filters" example to use named parameters - Add clarification note explaining the difference between using full index names (e.g., 'Document[embedding]') in SELECT clauses vs property names (e.g., 'embedding') in WHERE clauses - Define queryVector variable in "Querying Vectors from Java" example with type and comment about matching dimensions - Define embeddingVector variable in "Transaction Support" example with type and comment about embedding model generation - Complete legacy bulk loading pattern example with all necessary variable definitions, imports, and initialization code for clarity These changes make the documentation examples more complete, runnable, and easier for users to understand and adapt. 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Rewrite the Vector Model section in multimodel.adoc to reflect the new LSMVectorIndex implementation based on JVector 4.0.0 library: - Replace outdated HnswVectorIndexRAM examples with modern LSMVectorIndex - Highlight key features: persistent storage, transaction support, automatic compaction, and SQL integration - Update SQL example to show modern CREATE INDEX LSM_VECTOR syntax - Update Java example to use LSMVectorIndexBuilder pattern - Simplify configuration parameters section (dimensions, similarity, maxConnections, beamWidth) - Reduce similarity functions table to the three main supported metrics (COSINE, DOT_PRODUCT, EUCLIDEAN) - Add cross-reference to detailed java-vectors documentation section 🤖 Generated with Claude Code Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
ff5b3f7 to
2ddbd3f
Compare
Summary
Completely rewrites the vector embeddings documentation to focus on the new LSMVectorIndex implementation built on the JVector library. This modernizes the documentation from the outdated HnswVectorIndexRAM-centric approach.
Changes
Key Improvements
Related Issues
Closes #357