Skip to content

Conversation

@robfrank
Copy link
Contributor

Summary

Completely rewrites the vector embeddings documentation to focus on the new LSMVectorIndex implementation built on the JVector library. This modernizes the documentation from the outdated HnswVectorIndexRAM-centric approach.

Changes

  • Reorganized structure: SQL-first approach with Java API as programmatic alternative
  • Comprehensive SQL examples: Index creation with all three similarity functions (COSINE, DOT_PRODUCT, EUCLIDEAN)
  • Enhanced Java API documentation: Programmatic index creation, vector insertion with callbacks, transaction support
  • Real-world use cases: Semantic document search, image similarity search, recommendation systems
  • Migration guide: Clear comparison between HnswVectorIndexRAM and LSMVectorIndex with backwards compatibility information
  • Best practices section: 7 actionable practices for optimal performance
  • Technical accuracy: Proper AsciiDoc syntax, accurate imports, correct SQL and Java API examples

Key Improvements

  1. LSMVectorIndex as Primary Approach: Positioned as the main production solution with full transaction support and automatic persistence
  2. SQL Integration Focus: Clear examples of creating and querying indexes via SQL, making it accessible to more users
  3. Performance Tuning: Detailed guidance on configurable parameters (maxConnections, beamWidth) with tuning recommendations
  4. Transaction Support: Explicit examples showing how automatic persistence works with ACID compliance
  5. Complete Feature Coverage: All similarity functions, metadata configuration, and automatic compaction documented
  6. Backwards Compatibility: Legacy bulk loading pattern documented for existing users while strongly recommending LSMVectorIndex

Related Issues

Closes #357

@gemini-code-assist
Copy link

Summary of Changes

Hello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly updates the vector embeddings documentation to align with the modern LSMVectorIndex implementation, which is built on the JVector library. The primary goal is to provide clear, comprehensive guidance for users, emphasizing SQL integration and robust transaction support, while also offering a migration path and best practices for optimal performance and scalability.

Highlights

  • Documentation Rewrite: The vector embeddings documentation has been completely rewritten to focus on the new LSMVectorIndex implementation, moving away from the outdated HnswVectorIndexRAM-centric approach.
  • SQL-First Approach: The documentation now adopts a SQL-first approach for vector index creation and querying, with the Java API presented as a programmatic alternative.
  • Comprehensive SQL Examples: New comprehensive SQL examples are provided for index creation using all three similarity functions (COSINE, DOT_PRODUCT, EUCLIDEAN) and for querying with vectorNeighbors().
  • Enhanced Java API: The Java API documentation has been enhanced to cover programmatic LSMVectorIndex creation, vector insertion with callbacks, and explicit transaction support.
  • Migration Guide & Best Practices: A clear migration guide compares HnswVectorIndexRAM and LSMVectorIndex, and a new section outlines 7 best practices for optimal performance.
  • Real-World Use Cases: The documentation includes practical examples for common use cases such as semantic document search, image similarity search, and recommendation systems.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Contributor

mergify bot commented Dec 10, 2025

🧪 CI Insights

Here's what we observed from your CI run for 2ddbd3f.

🟢 All jobs passed!

But CI Insights is watching 👀

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a comprehensive rewrite of the vector embeddings documentation, shifting the focus to the modern LSMVectorIndex implementation. The new documentation is well-structured, with a SQL-first approach, detailed examples for both SQL and the Java API, and sections on best practices and migration. My review focuses on improving the clarity and completeness of the code examples to ensure they are easy for users to understand and adapt. I've suggested defining missing variables in several Java snippets and improving the consistency of SQL query examples.

robfrank added a commit that referenced this pull request Dec 10, 2025
Improve code example clarity and completeness:

- Update Basic Similarity Search to use named parameters ($queryVector)
  instead of hardcoded vector arrays for consistency with best practices
- Update "Combining with Other Filters" example to use named parameters
- Add clarification note explaining the difference between using full
  index names (e.g., 'Document[embedding]') in SELECT clauses vs property
  names (e.g., 'embedding') in WHERE clauses
- Define queryVector variable in "Querying Vectors from Java" example
  with type and comment about matching dimensions
- Define embeddingVector variable in "Transaction Support" example
  with type and comment about embedding model generation
- Complete legacy bulk loading pattern example with all necessary
  variable definitions, imports, and initialization code for clarity

These changes make the documentation examples more complete, runnable,
and easier for users to understand and adapt.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
robfrank and others added 4 commits December 19, 2025 19:12
Rewrite vector embeddings documentation to focus on the new LSMVectorIndex implementation built on the JVector library. This replaces the outdated HnswVectorIndexRAM-centric approach with comprehensive coverage of:

- LSMVectorIndex features and architecture
- SQL-first approach for index creation and querying
- Java API programmatic alternatives
- All three similarity functions (COSINE, DOT_PRODUCT, EUCLIDEAN)
- Transaction support and automatic persistence
- Performance tuning parameters
- Real-world use cases (semantic search, image similarity, recommendations)
- Migration guide from legacy HnswVectorIndexRAM

The documentation now emphasizes LSMVectorIndex as the primary production approach while maintaining backwards compatibility information.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Improve code example clarity and completeness:

- Update Basic Similarity Search to use named parameters ($queryVector)
  instead of hardcoded vector arrays for consistency with best practices
- Update "Combining with Other Filters" example to use named parameters
- Add clarification note explaining the difference between using full
  index names (e.g., 'Document[embedding]') in SELECT clauses vs property
  names (e.g., 'embedding') in WHERE clauses
- Define queryVector variable in "Querying Vectors from Java" example
  with type and comment about matching dimensions
- Define embeddingVector variable in "Transaction Support" example
  with type and comment about embedding model generation
- Complete legacy bulk loading pattern example with all necessary
  variable definitions, imports, and initialization code for clarity

These changes make the documentation examples more complete, runnable,
and easier for users to understand and adapt.

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Rewrite the Vector Model section in multimodel.adoc to reflect the new
LSMVectorIndex implementation based on JVector 4.0.0 library:

- Replace outdated HnswVectorIndexRAM examples with modern LSMVectorIndex
- Highlight key features: persistent storage, transaction support,
  automatic compaction, and SQL integration
- Update SQL example to show modern CREATE INDEX LSM_VECTOR syntax
- Update Java example to use LSMVectorIndexBuilder pattern
- Simplify configuration parameters section (dimensions, similarity,
  maxConnections, beamWidth)
- Reduce similarity functions table to the three main supported metrics
  (COSINE, DOT_PRODUCT, EUCLIDEAN)
- Add cross-reference to detailed java-vectors documentation section

🤖 Generated with Claude Code

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@robfrank robfrank force-pushed the doc/357-vector-index branch from ff5b3f7 to 2ddbd3f Compare December 19, 2025 18:12
@robfrank robfrank merged commit 90be670 into main Dec 19, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document LSMVector index

2 participants