Skip to content

HNSW index: SIGSEGV on repeated queries, empty graph, wrong metric and ordering #182

@grparry

Description

@grparry

Hi! Following up on #167 / #180 — while testing the HNSW index for a pgvector comparison benchmark, I found several issues that cause crashes and incorrect results. Fix is in #181.

Bugs

1. SIGSEGV on repeated k-NN queries (crash)

Running two or more ORDER BY embedding <=> query LIMIT k queries in the same psql session crashes the PostgreSQL backend with signal 11 (segfault). The fault address is typically 0x1 — stale palloc memory.

Root cause: hnsw_beginscan calls RelationGetIndexScan but never allocates the xs_orderbyvals / xs_orderbynulls arrays. The executor assumes these are valid and writes distance values into whatever memory they happen to point at. First query often works (palloc0'd to null); second query crashes.

Repro:

CREATE TABLE t (id serial, embedding ruvector(5));
INSERT INTO t (embedding) VALUES ('[1,0,0,0,0]'), ('[0,1,0,0,0]'), ('[0,0,1,0,0]');
CREATE INDEX ON t USING hnsw (embedding ruvector_cosine_ops);
SET enable_seqscan = off;

-- First query: usually works
SELECT id FROM t ORDER BY embedding <=> '[1,0,0,0,0]'::ruvector LIMIT 3;

-- Second query: SIGSEGV
SELECT id FROM t ORDER BY embedding <=> '[0,1,0,0,0]'::ruvector LIMIT 3;

2. Empty HNSW graph (no results)

connect_node_to_neighbors is a no-op TODO stub, so hnsw_build inserts nodes but never creates edges. The search traversal finds only the entry point.

3. Wrong distance metric (wrong results)

hnsw_build uses HnswConfig::default() which hardcodes DistanceMetric::Euclidean, even when the index is created with ruvector_cosine_ops. The search computes Euclidean distances on data that should use cosine similarity.

4. Wrong result ordering (wrong results)

BinaryHeap::into_iter().take(k) iterates the heap's backing array in arbitrary order, not sorted order. The results returned are k random candidates from the ef_search pool, not the k closest.

5. "index returned tuples in wrong order" (error on PG17)

If xs_recheckorderby is set to true, PG17's IndexNextWithReorder compares index-reported distances against recalculated distances from heap tuples. Floating-point precision differences cause spurious errors.

6. Use-after-free in endscan

hnsw_endscan unconditionally calls Box::from_raw on scan->opaque without checking for null, risking a double-free if called after a rescan.

Environment

  • PostgreSQL 17.7
  • pgrx 0.12.9
  • ruvector-postgres 2.0.1
  • Linux x86_64

Fix

PR #181 addresses all six issues. The same xs_orderbyvals allocation fix is also applied to ivfflat_ambeginscan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions