-
Notifications
You must be signed in to change notification settings - Fork 341
Description
Hi! Following up on #167 / #180 — while testing the HNSW index for a pgvector comparison benchmark, I found several issues that cause crashes and incorrect results. Fix is in #181.
Bugs
1. SIGSEGV on repeated k-NN queries (crash)
Running two or more ORDER BY embedding <=> query LIMIT k queries in the same psql session crashes the PostgreSQL backend with signal 11 (segfault). The fault address is typically 0x1 — stale palloc memory.
Root cause: hnsw_beginscan calls RelationGetIndexScan but never allocates the xs_orderbyvals / xs_orderbynulls arrays. The executor assumes these are valid and writes distance values into whatever memory they happen to point at. First query often works (palloc0'd to null); second query crashes.
Repro:
CREATE TABLE t (id serial, embedding ruvector(5));
INSERT INTO t (embedding) VALUES ('[1,0,0,0,0]'), ('[0,1,0,0,0]'), ('[0,0,1,0,0]');
CREATE INDEX ON t USING hnsw (embedding ruvector_cosine_ops);
SET enable_seqscan = off;
-- First query: usually works
SELECT id FROM t ORDER BY embedding <=> '[1,0,0,0,0]'::ruvector LIMIT 3;
-- Second query: SIGSEGV
SELECT id FROM t ORDER BY embedding <=> '[0,1,0,0,0]'::ruvector LIMIT 3;2. Empty HNSW graph (no results)
connect_node_to_neighbors is a no-op TODO stub, so hnsw_build inserts nodes but never creates edges. The search traversal finds only the entry point.
3. Wrong distance metric (wrong results)
hnsw_build uses HnswConfig::default() which hardcodes DistanceMetric::Euclidean, even when the index is created with ruvector_cosine_ops. The search computes Euclidean distances on data that should use cosine similarity.
4. Wrong result ordering (wrong results)
BinaryHeap::into_iter().take(k) iterates the heap's backing array in arbitrary order, not sorted order. The results returned are k random candidates from the ef_search pool, not the k closest.
5. "index returned tuples in wrong order" (error on PG17)
If xs_recheckorderby is set to true, PG17's IndexNextWithReorder compares index-reported distances against recalculated distances from heap tuples. Floating-point precision differences cause spurious errors.
6. Use-after-free in endscan
hnsw_endscan unconditionally calls Box::from_raw on scan->opaque without checking for null, risking a double-free if called after a rescan.
Environment
- PostgreSQL 17.7
- pgrx 0.12.9
- ruvector-postgres 2.0.1
- Linux x86_64
Fix
PR #181 addresses all six issues. The same xs_orderbyvals allocation fix is also applied to ivfflat_ambeginscan.